AUTOMATIC FORMATION OF DICTIONARIES OF THE SUBJECT AREA
PDF

Keywords

semantic connectedness, word embedding, word2vec, ontology, stemming, lemmatization

Abstract

The problem of creating dictionaries (groups) of subject areas from natural language words (concepts) based on data from text documents is considered. The identification of the semantic coherence of words based on the relationship between concepts is also investigated. Based on the ranking of features based on the results of splitting documents into clusters, semantically related words are selected by topic.

PDF

References

A. Ismailov, N. Abdurakhmonova (2022). The development of Alisher stemmer for Uzbek Language. Science and Education. 3.

Abdurakhmonova N., Tuliyev U. Morphological analysis by finite state transducer for Uzbek-English machine translation / X International Journal of Systems Engineering 2018; 2(1): 26-28 http://www.sciencepublishinggroup.com/j/ijsedoi:10.11648/j.ijse.20180201.16

Dubin, David (2004). "The most influential paper Gerard Salton never wrote". Retrieved 18 October 2020.

Ignatev N.A., Tuliev U.Y. Semantic structuring of text documents based on patterns of natural language entities // Computer Research and Modeling, 2022, vol. 14, no. 5, pp. 1185-1197. DOI: 10.20537/2076-7633-2022-14-5-1185-1197

J. Nay (21 December 2017). "Gov2Vec: Learning Distributed Representations of Institutions and Their Legal Text". SSRN. SSRN 3087278

M. Sharipov, O. Sobirov (2022). Development of a rule-based lemmatization algorithm through Finite State Machine for Uzbek language.

M. Sharipov, O. Yuldashov UzbekStemmer: Development of a Rule-Based Stemming Algorithm for Uzbek Language// The International Conference on Agglutinative Language Technologies as a challenge of Natural Language Processing (ALTNLP), June 6, 2022, Koper, Slovenia.

Reisinger, Joseph; Mooney, Raymond J. (2010). Multi-Prototype Vector-Space Models of Word Meaning. Vol. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Los Angeles, California: Association for Computational Linguistics. pp. 109–117. ISBN 978-1-932432-65-7. Retrieved October 25, 2019.