A Latent Dirichlet Allocation and Fuzzy Clustering Based Machine Learning Model for Text Thesaurus.
In: International Journal of Computers, Communications & Control, Jg. 15 (2020-04-01), Heft 2, S. 1-16
Online
academicJournal
Zugriff:
It is not quite possible to use manual methods to process the huge amount of structured and semi-structured data. This study aims to solve the problem of processing huge data through machine learning algorithms. We collected the text data of the company's public opinion through crawlers, and use Latent Dirichlet Allocation (LDA) algorithm to extract the keywords of the text, and uses fuzzy clustering to cluster the keywords to form different topics. The topic keywords will be used as a seed dictionary for new word discovery. In order to verify the efficiency of machine learning in new word discovery, algorithms based on association rules, N-Gram, PMI, andWord2vec were used for comparative testing of new word discovery. The experimental results show that the Word2vec algorithm based on machine learning model has the highest accuracy, recall and F-value indicators. [ABSTRACT FROM AUTHOR]
Copyright of International Journal of Computers, Communications & Control is the property of Fundatia Agora and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Titel: |
A Latent Dirichlet Allocation and Fuzzy Clustering Based Machine Learning Model for Text Thesaurus.
|
---|---|
Autor/in / Beteiligte Person: | Luo, J. ; Yu, D. ; Dai, Z. |
Link: | |
Zeitschrift: | International Journal of Computers, Communications & Control, Jg. 15 (2020-04-01), Heft 2, S. 1-16 |
Veröffentlichung: | 2020 |
Medientyp: | academicJournal |
ISSN: | 1841-9836 (print) |
DOI: | 10.15837/ijccc.2020.2.3811 |
Schlagwort: |
|
Sonstiges: |
|