Coding of sound files

For individual words, a subset of phenomena was coded (see individual corpus descriptions for further details). The coding scheme is phonemic and specifies the following linguistic variables: target phoneme/grapheme; preceding and following phoneme/grapheme; stress; and position in the word. A complete list of the values for each of these variables is available in [RPD_Coding_Linguistic variables.pdf]

The following coding conventions were adopted:

  1. Coding of 'Target phoneme' / 'Preceding phoneme' / 'Following phoneme': coding is based on phonemic (i.e. dictionary) transcription. For example, French <r> is coded /R/ even though non-uvular fricative realizations may be encountered (NB. SAMPA symbols [SAMPA.pdf] are used here and elsewhere). Similarly, Spanish <b> is coded phonemically /b/ in spite of the possibility of approximant realizations in some environments.

  2. Pauses between target phoneme and preceding/following phoneme: If the target sound is preceded or followed by a pause in the particular sound file, 'Preceding phoneme'/'Following phoneme' is coded as '#' (pause).

  3. Coding 'Target / Preceding / Following Grapheme':

  4. Stress: stress is coded based on (i) the syllable in which the sound occurs and (ii) the location of this syllable vis--vis the syllable bearing main (tonic) stress (in examples below, the syllable for which the coding is valid is in bold and stressed syllable is underlined).

  5. Position in word: this is based on the phonetic transcription and not the orthographic form.

