From SurveyWiki
Jump to navigationJump to search


WordSurv helps determine linguistic relationships by aiding the comparison of word lists. It functions in three main areas: (1) entry and maintenance of word lists and cognate decisions, (2) computation of lexicostatistic and phonostatistic measures of similarity, and (3) output of data and results in various formats. WordSurv also supports the COMPASS algorithm to aid in comparative reconstruction.

WordSurv is being developed through a partnership between the Computer and Systems Science Department at Taylor University and SIL International. Version 7 is available here.

For detailed instructions about inputting word lists into WordSurv 6.0.2, see Inputting Data into WordSurv.

More information, release notes, documentation and downloads of this free application are available via SIL's Wordsurv pages.


Wordsurv is designed to enable word lists to be elicited directly onto a PalmOS handheld computer. For more information about Palmsurv, visit Sourceforge.


Wordcorr is a computational tool that assists linguists in comparing natural languages systematically. It splits the work involved in applying the Comparative Method. The user identifies patterns in the data that may show common origin. The software keeps track of user judgments in a framework from which the user can draw evidence about the development of later languages from earlier ones. Wordcorr does not perform the analysis for the user; it organizes the evidence that the user considers relevant to the analysis.

For more, visit the Wordcorr website.

IPA Help

Use IPA Help to learn to hear IPA sounds by clicking their symbols in the IPA chart, to test recognition of phones and to hear IPA sounds in context with example data. Very useful for practicing particular sounds before carrying out word list data collection. You can isolate particular sound sets to focus on. More information from SIL International

Phonology Assistant

Helps users to keep track of phonetic data from the keyboard and from Speech Analyzer sound recordings. It uses standard IPA characters to index and display data. The program makes it easy to search lists for phonetic patterns in any number of segment combinations or positions. For more see SIL International.

Speech Analyzer

Speech Analyzer performs fundamental frequency, spectrographic and spectral analysis, and duration measurements on sound recordings. You can add phonemic, orthographic, tone, and gloss transcriptions to phonetic transcriptions in an interlinear format. Other features include slowed playback, repeat loops and overlays to assist with perception and mimicry of sounds.

For more see SIL International's Speech Analyzer page.

Epi Info

Quite a powerful piece of software that will do much more than a surveyor would ever need. But it can help you develop a questionnaire or form, customize the data entry process, and enter and analyze data. You can then produce statistics, tables, graphs, and maps from your data.

For more see the Epi Info page on the website of the Centers for Disease Control and Prevention.


Gabmap is a web-based application of the RuG/L04 software. It’s been developed recently by the University of Groningen dialectometrists who developed the old RuG/L04 (the basis of Rugloafer). The goal of Gabmap is to make dialectometry (the quantitative measure of dialect difference) more accessible and easier to use for non-computer-type dialectologists.

Basically, Gabmap processes word lists with Levenshtein distance to measure phonetic distance. Then it produces multi-dimensional scaling (mds), hierarchical clustering, dialect maps, and also offers data mining for determining what makes a cluster distinct. It’s free, it’s fast, it’s easy to use, it has default settings based on best practices, it organizes and saves the produced diagrams/maps/files online, it has pretty complete manuals and tutorials.

Here are a few things that Gabmap does much better than the old RuG/L04:

  • Gabmap has a page for cluster determinants. You can select a cluster and look at what makes it distinct- basically, it's identifying isoglosses for you. I (Cathryn Yang) tried it on Nisu (see paper available via this page) and Lalo (see 14Kb pdf at this link, and it found many of the isoglosses that I had identified manually (after weeks of screen-staring and hair-tearing).
  • Gabmap can handle data in Unicode- I think Rugloafer can do this also
  • Gabmap tokenizes the data in a simple way that forces vowels to align with vowels, and consonants with consonants
  • Gabmap can produce simple dialect maps if you upload a .kmz or .kml file- and they teach you how to extract a map from Google Earth
  • Gabmap has a page dedicated to cluster validation. Because clustering techniques are pretty unstable, they have a page where you can easily compare the clustering with the mds plot, to check if the clustering is grounded in reality.
  • Gabmap has a page for "fuzzy" clustering- they developed a technique that adds “noise” to the clustering, which makes the clustering more stable

One drawback: It requires the data to be formatted so that the datapoints are the rows, and the lexical items are the columns, which is not the way I usually organize my word lists. But, just by using the Paste Special > Transpose option in Excel, I got my data formatted correctly within seconds.

Another drawback, pointed out by Chad White: in older versions (2003/4) of Excel, columns are limited to about 250. So if your word list is longer than that, you’ll get an error when transposing. However, the Groningen dialectometrists have been getting consistently good results (in terms of phonetic distance) with a much smaller data set, around 100 words. My Nisu and Lalo data sets were both around 200 items, and I still got results consistent with my findings based on much longer word lists (Nisu-400; Lalo-1,001). Plus, newer versions of Excel don’t have this limit.

I first sent this review to Chad White, developer of Rugloafer, to get his take on Gabmap. He was enthusiastic about Gabmap’s functionality and wished to encourage people to use it. Here are some of Chad’s comments:

In tokenizing the data it handles diacritics as well. That was one troublesome thing about the way we had been doing it.

You can see a map of the distribution of individual items across a dataset, ie. the distribution of one variant of the word banana. I don't know how many people would want to know about that feature, but I can think of times when that would have come in handy.

All these maps and data manipulations are downloadable in a variety of formats, making use in a paper very easy.


This is a media editor that enables you to transfer files to and from Sony MiniDisc Players. Available through SurveyWiki: SonicStage v4.3 installer and converter for SonicStage.