February 28, 2024Open Access

Analysis and Discussion of Public Sports Data based on Clustering Model

Key Points

Key points are not available for this paper at this time.

Abstract

In a situation where effective analysis and understanding are emerging as important tasks due to the rapidly increasing amount and diversity of data in modern society, this paper identifies problems with data quality and consistency through clustering public data and seeks ways to improve them.The data used was general in physical education provided by data.go.kr.Nouns were extracted from text data and clustering was performed using TF-IDF vectorization.Performance was evaluated by comparing Kmeans, DBSCAN, and GMM algorithms and keyword extraction methods, and problems with data consistency and quality were analyzed.As a result of the study, it was confirmed that stopword processing and choice of keyword extraction method had a significant impact on clustering results.Additionally, data length, format, and keyword quality affect clustering performance.It was concluded that data imbalance, lack of consistency, and lack of standards can affect clustering results and that standardized guidelines and research are needed to solve these problems.We identify the diversity of data through clustering, suggest ways to improve data collection and analysis strategies through this, and emphasize the importance of improving data quality and active use of clustering techniques for the effective use of public data.

Analysis and Discussion of Public Sports Data based on Clustering Model

Key Points

Abstract

Cite This Study

Also Consider

Also Consider