ISSN 1119-4618
 

Original Research 


A Fuzzy C-means News Article Clustering Based on an Improved Sqrt-Cosine Similarity Measurement

Kayode Samuel Olaseni, Salisu Aliyu, Kareem Bakare.

Abstract
Document clustering is classifying documents into group of clusters such that documents in a cluster are similar but different from documents in other clusters. Several kinds of research have been done on news article clustering using some sort of similarity measures. These similarity measures are limited in performance on high dimensional data. In this paper, we present a fuzzy c-means clustering technique using N-gram with an improved similarity measure referred to as ‘improved sqrt-cosine similarity measurement’ for computing distance measure. Natural Language Processing techniques are applied on 20 Newsgroup dataset and the pre-processed data is converted into feature vector model using Term Frequency-Inverse Document Frequency (TF-IDF). The improved sqrt-cosine similarity measurement is used to compute the distances between news articles and clustering is performed using fuzzy c-means algorithm. The experimented technique was evaluated against existing techniques using accuracy and purity as evaluation metrics. The proposed technique outperformed the existing methods with better accuracy and purity of the clusters.

Key words: Clustering; Similarity Measurement; Data Mining; N-grams; Knowledge Discovery


 
ARTICLE TOOLS
Abstract
PDF Fulltext
How to cite this articleHow to cite this article
Citation Tools
Related Records
 Articles by Kayode Samuel Olaseni
Articles by Salisu Aliyu
Articles by Kareem Bakare
on Google
on Google Scholar


How to Cite this Article
Pubmed Style

Olaseni KS, Aliyu S, Bakare K. A Fuzzy C-means News Article Clustering Based on an Improved Sqrt-Cosine Similarity Measurement. JPAS. 2022; 22(1): 21-27. doi:10.5455/sf.olasamkay


Web Style

Olaseni KS, Aliyu S, Bakare K. A Fuzzy C-means News Article Clustering Based on an Improved Sqrt-Cosine Similarity Measurement. https://www.atbuscienceforum.com/?mno=75049 [Access: November 12, 2022]. doi:10.5455/sf.olasamkay


AMA (American Medical Association) Style

Olaseni KS, Aliyu S, Bakare K. A Fuzzy C-means News Article Clustering Based on an Improved Sqrt-Cosine Similarity Measurement. JPAS. 2022; 22(1): 21-27. doi:10.5455/sf.olasamkay



Vancouver/ICMJE Style

Olaseni KS, Aliyu S, Bakare K. A Fuzzy C-means News Article Clustering Based on an Improved Sqrt-Cosine Similarity Measurement. JPAS. (2022), [cited November 12, 2022]; 22(1): 21-27. doi:10.5455/sf.olasamkay



Harvard Style

Olaseni, K. S., Aliyu, . S. & Bakare, . K. (2022) A Fuzzy C-means News Article Clustering Based on an Improved Sqrt-Cosine Similarity Measurement. JPAS, 22 (1), 21-27. doi:10.5455/sf.olasamkay



Turabian Style

Olaseni, Kayode Samuel, Salisu Aliyu, and Kareem Bakare. 2022. A Fuzzy C-means News Article Clustering Based on an Improved Sqrt-Cosine Similarity Measurement. Science Forum (Journal of Pure and Applied Sciences), 22 (1), 21-27. doi:10.5455/sf.olasamkay



Chicago Style

Olaseni, Kayode Samuel, Salisu Aliyu, and Kareem Bakare. "A Fuzzy C-means News Article Clustering Based on an Improved Sqrt-Cosine Similarity Measurement." Science Forum (Journal of Pure and Applied Sciences) 22 (2022), 21-27. doi:10.5455/sf.olasamkay



MLA (The Modern Language Association) Style

Olaseni, Kayode Samuel, Salisu Aliyu, and Kareem Bakare. "A Fuzzy C-means News Article Clustering Based on an Improved Sqrt-Cosine Similarity Measurement." Science Forum (Journal of Pure and Applied Sciences) 22.1 (2022), 21-27. Print. doi:10.5455/sf.olasamkay



APA (American Psychological Association) Style

Olaseni, K. S., Aliyu, . S. & Bakare, . K. (2022) A Fuzzy C-means News Article Clustering Based on an Improved Sqrt-Cosine Similarity Measurement. Science Forum (Journal of Pure and Applied Sciences), 22 (1), 21-27. doi:10.5455/sf.olasamkay