Automated data exploration is very useful for evaluating key aspects of populations such as young adults (which here refers to the youth population in the United States represented by students in grades 9 through 12). This article shows how Principal Component Analysis (PCA) can be used for this exploration. PCA is applicable to data analysis situations with data from n individuals of attributes (generally n >> m). For analytical purposes, the data can be visualized as n points in a Euclidean space with Cartesian coordinates, with m perpendicular coordinate axes, where each axis corresponds to an attribute. When m is large, the points become difficult to visualize, so PCA is useful, as it is a dimensionality reduction method that facilitates the visualization of the points. The objective of this article is to identify relationships between attributes, where there is a primary attribute of interest. The present work describes some of the main theoretical aspects of PCA and then uses PCA to analyze data, as a practical example. The data comes from the publicly available results of a 2023 survey administered to a nationally representative sample of students in the United States, to assess health risk behaviors among young adults (students in grades 9 through 12), which was conducted by the Youth Risk Behavior Surveillance System (YRBSS), managed by the Centers for Disease Control and Prevention—CDC. The results of this work graphically discover relationships between specific data attributes. The reliability of the results is then discussed, considering: (1) recommendations taken from PCA literature, and (2) the use of a graphical tool called a Zoning Biplot, an improved form of displaying PCA results. This work is relevant because it uses the Zoning Biplot, proposed by the authors, which shows more detail in the results compared to a conventional Biplot; the authors argue that this detail allows for valid results across a larger number of datasets, such as the dataset in the example presented. The authors present a graphical development to support the concept and advantage of a Zoning Biplot.
Ambrosio-Lucas et al. (Mon,) studied this question.