Overview of commercial and free thesauri
|
Free Thesauri
Many thesaurus projects are available in free software. Although most are not as well known as their proprietary counterparts, they often manage to be as feature-rich. The aforementioned WordNet is the result of a research project by the same name from Princeton University. The institution has been working on an English lexicographical database for several decades.
The database groups nouns, verbs, adjectives, and adverbs at the semantic and lexical level. The project forms the basis for comparative linguistics and natural language processing, and is therefore the basis for several of the programs presented here.
In addition to a web-based interface, the current state of research for various platforms is also available (as an Ubuntu package [24], among others). This includes both a wn command-line program, as well as a graphical application called WordNet Browser.
With the query in Listing 3, you can gain insight into the synonyms for and meaning of the substantive "fair." The parameter -synsn stands for and selects synonym, whereas n does the same for substantives (English nouns). Using the command wnb , you can start the GUI program and type in the search box on the top left.
Listing 3
Synonyms for "fair"
01 $ wn fair -synsn 02 03 Synonyms/Hypernyms (Ordered by Estimated Frequency) of noun fair 04 05 4 senses of fair 06 07 Sense 1 08 carnival, fair, funfair 09 => show 10 11 Sense 2 12 fair 13 => gathering, assemblage 14 15 Sense 3 16 fair 17 => exhibition, exposition, expo 18 19 Sense 4 20 bazaar, fair 21 => sale, cut-rate sale, sales event
Below the input box, four buttons appear that display the respective available word form. To restrict the list on synonyms for nouns, click the "Noun" button and select "Synonyms, ordered by estimated frequency" from the list. The result (Figure 4) is identical to the output on the command line.
Several implementations exist for WordNet and are listed on the project website. To use Perl, it is best to use the WordNet-QueryData module [25], which is available as an Ubuntu package libwordnet-querydata-perl .
For Python, the Python Natural Language Toolkit (NLTK) is a good choice [26]. The latter provides a suitable parsing class for WordNet.
Kthesaurus
Kthesaurus (Figure 5) provides similar functions for Calligra-Suite (formally KOffice) as OpenThesaurus does for LibreOffice.
The lexical information gets extracted from the WordNet databank. Because of this, Kthesaurus is only available in English. To use the software, install the package for your distribution.
In the box in the top left, first enter the word you want and scroll over the Search button to search within the database. Then, under the Thesaurus tab, you will see three columns filled with synonyms (column 1), hypernyms (column 2), and hyponyms (column 3).
The Replace button replaces the word in the text with the selections (note that this is only possible if you have activated Kthesaurus from within Calligra-Office). You can change the search vocabulary by selecting one of the entries from the columns with a double click. By using tabs, you can switch between the original search and the entry from the WordNet databank.
Figure 6 displays the overview for the word "help" and is sorted according to the average frequency. Use the drop-down box to get more information in accordance with the way these are stored in the WordNet databank (i.e., by compound words, synonyms, antonyms, and everyday words).
« Previous 1 2 3 4 Next »
Buy this article as PDF
Pages: 5
(incl. VAT)