Classification of Twitter Trending Issues Through Three Clustering Methods
##plugins.themes.bootstrap3.article.main##
Keywords
Trending topic, Twitter (X), K-Means, DBScan, LDA
Abstract
Twitter is one of the most dynamic social media platforms that provides real-time information through its trending topics feature, which reflects the most talked about issues among users. However, in Indonesia, trending topics are often dominated by entertainment, celebrity gossip or light-hearted viral content, and are not used to highlight or analyze more substantial social issues. This study aims to classify Twitter trending topics in Indonesia using three clustering algorithms: K-Means, DBSCAN, and Latent Dirichlet Allocation (LDA). Data was collected over a certain period and processed through a text preprocessing stage before applying the clustering algorithms. The results show that LDA without keyword filtering provides the most relevant and dominant topic classification, the bar chart results tend to be dominant in topic 0 there are as many as 160 topics with the main cluster relating to the Indonesian presidential election. These findings suggest that LDA outperforms K-Means and DBSCAN in identifying latent topic structures in Twitter data. This study contributes to a better understanding of trending topics and supports data-driven public opinion analysis and decision-making.
References
Datareportal.com. 20 October 2022. Digital 2022: October
Global Statshot Report. Retrieved November 1, 2022, from https://datareportal.com/reports/digital-2022october-global-statshot.
Databoks.katadata.co.id. https://databoks.katadata.co.id/datapublish/2023/02/27/pengguna-twitter-di-indonesia-capai-24-juta-hingga-awal-2023-peringkat-berapa-di-dunia
L. Wang, J. Niu and S. Yu, "SentiDiff: Combining Textual Information and Sentiment Diffusion Patterns for Twitter Sentiment Analysis," in IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 10, pp. 2026-2039, Oct. 1, 2020, doi: 10.1109/TKDE.2019.2913641.
F., F., &; Widianto, S. (2023). Examining Characteristics on Twitter Users' Text and Hashtag Utilization During Tech Winter Layoff Post-COVID-19 Using LDA and K-Means Clustering Approach. Makara Human Behavior Studies in Asia, 27(2). https://doi.org/10.7454/hubs.asia.1191223.
W. Hall, R. Tinati and W. Jennings, "From Brexit to Trump: Social Media's Role in Democracy," in Computer, vol. 51, no. 1, pp. 18-27, January 2018, doi: 10.1109/MC.2018.1151005.
Mustakim et al, "Clustering of Public Opinion on Natural Disasters in Indonesia Using DBSCAN and K-Medoids Algorithms", Journal of Physics: Conference Series, Volume 1783, Annual Conference on Science and Technology Research (ACOSTER) 2020, 20-21 June 2020, Medan, Indonesia, 2021 J. Phys.: Conf. Ser. 1783 012016 DOI 10.1088/1742-6596/1783/1/012016
J Garay et al, “An analysis on the insights of the anti-vaccine movement from social media posts using k-means clustering algorithm and VADER sentiment analyzer, IOP Conference Series: Materials Science and Engineering, Volume 482, International Conference on Information Technology and Digital Applications (ICITDA 2018) 8–9 November 2018, Manila City, Philippines, 2019 IOP Conf. Ser.: Mater. Sci. Eng. 482 012043 DOI 10.1088/1757-899X/482/1/012043
Iparraguirre-Villanueva, O., Guevara-Ponce, V., Sierra-Liñan, F., Beltozar-Clemente, S., & Cabanillas-Carbonell, M. (2022). Sentiment Analysis of Tweets using Unsupervised Learning Techniques and the K-Means Algorithm. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 13, No. 6, 2022, DOI https://dx.doi.org/10.14569/IJACSA.2022.0130669
J. Dan, "Research and Improvement of K-means Clustering Analysis Algorithm in the Information Warfare," 2022 3rd International Conference on Computer Science and Management Technology (ICCSMT), Shanghai, China, 2022, pp. 284-287, doi: 10.1109/ICCSMT58129.2022.00066.
Y. Hu, "Customer Market Analysis Based on Interval Value Data Dynamic Clustering Algorithm," 2023 International Conference on Integrated Intelligence and Communication Systems (ICIICS), Kalaburagi, India, 2023, pp. 1-6, doi: 10.1109/ICIICS59993.2023.10421290.
C. Zhang, "Analysis of Weibo User Characteristics and Emotional Tendency in COVID-19 Scenario Based on K-means Clustering Algorithm," 2022 6th Annual International Conference on Data Science and Business Analytics (ICDSBA), Changsha, China, 2022, pp. 29-32, doi: 10.1109/ICDSBA57203.2022.00062.
H. Aftab, J. Shuja, W. Alasmary and E. Alanazi, "Hybrid DBSCAN based Community Detection for Edge Caching in Social Media Applications," 2021 International Wireless Communications and Mobile Computing (IWCMC), Harbin City, China, 2021, pp. 2038-2043, doi: 10.1109/IWCMC51323.2021.9498609.
X. Si, P. Li, X. Hu and Y. Zhang, "An Online Dirichlet Model based on Sentence Embedding and DBSCAN for Noisy Short Text Stream Clustering," 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 2022, pp. 01-08, doi: 10.1109/IJCNN55064.2022.9892414.
Gholizadeh, N., Saadatfar, H. & Hanafi, N. K-DBSCAN: An improved DBSCAN algorithm for big data. J Supercomput 77, 6214–6235 (2021). https://doi.org/10.1007/s11227-020-03524-3
J. Hoblos, "Experimenting with Latent Semantic Analysis and Latent Dirichlet Allocation on Automated Essay Grading," 2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS), Paris, France, 2020, pp. 1-7, doi: 10.1109/SNAMS52053.2020.9336533.
G. Harshvardhan, M. K. Gourisaria, A. Sahu, S. S. Rautaray and M. Pandey, "Topic Modelling Twitterati Sentiments using Latent Dirichlet Allocation during Demonetization," 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 2021, pp. 811-815.
Z. Liu, M. Li, Y. Liu and M. Ponraj, "Performance evaluation of Latent Dirichlet Allocation in text mining," 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), Shanghai, China, 2011, pp. 2695-2698, doi: 10.1109/FSKD.2011.6020066.
Dalmaijer, E.S., Nord, C.L. &; Astle, D.E. Statistical power for cluster analysis. BMC Bioinformatics 23, 205 (2022). https://doi.org/10.1186/s12859-022-04675-1
Indra, E. Winarko, and R. Pulungan, “Trending Topics Detection of Indonesia Tweets Using BN-Grams and Doc-p”, Journal of King Saud University – Computer and Information Sciences, Volume 31, Issue 2, 2019, Pages 266-274, https://doi.org/10.1016/j.jksuci.2018.01.005.
