Nonsense words Nonsense words for blending
Basic 5 Vowels e-Controlled Vowels o-Controlled Vowels r-Controlled Vowels Adv. Basic Vowels Adv. e-Controlled Vowels Adv. o-Controlled Vowels Adv. r-Controlled Vowels All Vowels on 1 page
SynPhony Prototype Consonant Blends Basic Assessment More Information
Alphabet Code Chart Code distribution Assessments Letter Sounds Blending words Whole Words? Links
SynPhony Tutorials Vowel Tutorials

SynPhony Home

GPC Analyses

Starting a GPC analysis

 

 

Starting a new GPC analysis project

So, you want to start a new GPC analysis on a language? Great! Depending on the complexity of the writing system it can be a big job. But the work you put into it will pay dividends in making the words of the language accessible for computer based literacy acquisition activities within SynPhony.

Data you need to get started:

  1. a large word list (if it uses a non-latin script then it should be in unicode)
  2. a list of the alphabetic characters used to write the language
  3. a list of digraphs or multigraphs used in the spelling system (a digraph is 2 letters that represent 1 sound. eg: 'th' in English. a multigraph would be more than 2 letters. eg: the 'eigh' in eight is a multigraph)

Great to have if you can get it:

If the language has a large amount of complexity or is an opaque writing system then you can benefit from computer readable pronunciation information for each word (or as many as possible) in some kind of phonetic transcription.
Where can you find this information? I don't have the answer to that. Try searching on the internet, asking at the education department in a university that teaches the language, ask on an online linguistic forum, etc. It might take a while to track down (if it exists at all) but it will be worth it.
Ideally, you should be a native speaker of the language you are analyzing. If that is not the case then you should at least have access to a native speaker of the language as they have all the pronunciation rules built into their heads and can make appropriate decisions and spot inconsistencies much quicker and more accurately.

Computer tools you will need:

Text editor (preferrably one with macro capability)

Database program

Scripting program

Concordance program

Starting analysis using Toolbox

I use Toolbox to manage my GPC database. It is a flat-file database program designed specifically for developing dictionaries, but it lends itself to almost any kind of data because it does not dictate any specific field markers. I also use UltraEdit as my text editor as well as Consistent Changes. It is a flexible scripting program that is a good way to make systematic changes in your database.

When you have a wordlist you can create a Toolbox lexicon file. Toolbox reads standard text files that contain text markers to indicate fields. Each field marker must start on a new line and is separated from the data by a space. Field markers are usually 1-4 characters long, but could be longer. Each record in the database starts with a field marker that is designated as the record marker. For dictionary files you can use \lx as the record marker, but in actuality you could use anything. So, for example, several records that only contain words would look like this:


\lx word1
\lx word2
\lx word3

However, we want to add more information than only words. So if we want to add part of speech data, pronunciation and our grapheme phoneme analysis in this file we could add them to additional fields like this:


\lx word1
\ps noun
\ph phonetic_form_in_ipa
\gpc this is where the gpc form of word1 would occur

\lx word2
\ps verb
\ph phonetic_form_in_ipa
\gpc this is where the gpc form of word2 would occur

You could use a plain text editor to edit a file like this, however, Toolbox offers several features that make it a tool of choice for this kind of work. You can filter for data in any field, make changes like search and replace that are restricted to one field only, and you have a good export utility. However, I often do some edits on my Toolbox files with a plain text editor outside of Toolbox when it is the easier or better tool for a particular edit (best if it has macro capability). In addition to Toolbox I also use a scripting program that can make changes to my database called Consistent Changes. It has easy syntax and is quite powerful. I can create scripts for you if you let me know what kind of changes you need. For English I used this program to create a script that did a lot of the GPC analysis for me. However, it still left a lot of work to do manually and I had a fairly good phonetic form available which was integrated in the same file. It would be impossible to do if that information were in another file. A sample record from my database looks like this:


\lx should've
\ps
\cmu SH UH1 D AH0 V
\cvc cvcvc
\str 10
\websyl 'shou.ld've
\ph ʃʊˈdǝ.v
\wid 27525
\nt insertion
\syll 2
\gpc sh_sh,book_ou,d_ld,',v_ve
\ss
\sd
\cpwd 00008
\et
\exp
\cob _
\mr 202 @+ve

As you can see, some fields are blank and some have data. This is a part of life in data management and it can be filled over time.The kinds of fields and the data they contain is completely up to you. Each field contains one kind of data and you can decide:

1) which characters to use to name that field and

2) what kind of data you will put in there.

You should make sure that if the alphabet contains non-roman characters that you use utf8 as your encoding. When you get a wordlist you can start with a plain list of one word per line. Then with a text-editor you can add your field codes using a search and replace. Search for every new line and replace with the new line character and the field marker.

Then you can add extra fields to every record by searching for the record marker and replacing with the fields you want to add. Then with a plain text editor (preferably one with a macro capability) you can copy the word in the record marker field into the \gpc field. Once it is there we can start to manipulate it to do our gpc analysis.

You can also copy the word from the record field to the \gpc field using a Consistent Changes script. I use this program a lot and can write you special scripts if needed. As the need arises I will add useful Consistent Changes scripts linked to this page for starting a project.

I would suggest that the minimum set of field markers for such a project be as follows:

 

\lx word
\ps part_of_speech_info
\ph phonetic_form (if the writing system is not transparent)
\syll number_of_syllables
\str stress_pattern_of_the_word
\freq how_often_the_word_appears_in_print
\gpc grapheme_phoneme_representation_of_the_word

 

You may add additional fields if you wish.