Linguakit and the keyword extractors

  • 27 June, 2014
  • Linguakit

Linguakit continues to expand.

From Cilenis we continue to striving to make Linguakit the favourite site by linguistics experts, journalists, editors, publishers, teachers, students or any language’s user. Therefore, today Linguakit integrates two more linguistic tools: keyword extractor and multiword extractor, an unique and rare tool in this field. With them you can get the most relevant information from a written text.

When you enter a text in keyword extractor or multiword extractor on Linguakit you will get automatically the main words or the group of leading words, sorted in descending order according to their degree of relevance. In addition, these words will be highlighted in the text, complementing the extraction with a visual of the terms.

How do these tools work?

To make this classification, the keyword extractor is based on a model integrated by observed frequencies and estimated frequencies. Therefore, the system calculates the weight of the words in the text, using statistical tests. So, the system makes a comparison between the observed frequency of the words in the text with the estimated frequency, it means, with the frequency that those words should have at the ideal corpus or reference corpus.

The multiword extractor‘s case, the strategy is different. It integrates two processes. Firstly, it identifies the “candidates” to multiword terms, which should belong to a grammatical pattern: name-preposition-name, adjective-name, name-adjective. Secondly, it orders them from highest to lowest relevance according to measures of statistics association.

Another difference between both extractors is also the information that the system provides you. When you enter a text in the keyword extractor the first thing that the system return to you is a cloud. It is building with the most important words in the text, highlighted in different colors and sizes according to their degree of relevance. You choose the number of words that appear in this cloud.

Linguakit Keyword extractor

What is the use of these extractors?

These two linguistics tools are very useful for the detection of topic in a fast and automatic way, which greatly facilitates the documentary classification and the labelling. Even, if you need to expand the keyword search to terms and topic that require more than one word to express themselves, you can do it with multiword extractor. Therefore, with the combination of these two instruments, the result obtained is so much more powerful.

Linguakit is already a keyword extractor, and it is also a verb conjugator, a sentiment analyzer and a language identifier. Optimize your resources and become a Linguakit’s user.

Enjoy the benefits of using this website.