What question did this study set out to answer?

The study aims to evaluate unsupervised machine learning techniques for identifying exoplanets from TESS data.

May 5, 2026Open Access

Detection of exoplanets from TESS imaging data using unsupervised machine learning techniques

Key Points

The study aims to evaluate unsupervised machine learning techniques for identifying exoplanets from TESS data.
Analyzed light curves of galactic stellar populations using k-means and k-medians clustering.
Employed dimensionality reduction methods like t-SNE and UMAP to interpret high-dimensional data.
Compared clusters with the TESS Objects of Interest catalog to assess the presence of undiscovered exoplanet candidates.
Identified clusters containing known TOIs also included additional unlabeled objects, suggesting potential new exoplanets.
The clustering successfully differentiated between transit-like signals and noise-dominated light curves.
Proposed framework enhances automated exoplanet detection pipelines in large datasets.

Abstract

The identification of exoplanets within habitable zones remains a central objective in modern astrophysics, particularly with the availability of large-scale photometric datasets from space-based missions such as the Transiting Exoplanet Survey Satellite (TESS). This study investigates the effectiveness of unsupervised machine learning techniques–specifically k-means and k-medians clustering–for analyzing and classifying light curves derived from galactic stellar populations. By extracting both basic and extended statistical features, dimensionality reduction methods including t-distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP) are employed to project high-dimensional data into interpretable low-dimensional spaces. To evaluate the relevance of the identified clusters, the results are systematically compared with the TESS Objects of Interest (TOI) catalog, incorporating information on confirmed planets and candidate signals. This comparison reveals that clusters containing known TOIs often include additional unlabeled objects, suggesting the presence of potentially undiscovered exoplanet candidates. Moreover, the clustering framework effectively distinguishes between transit-like signals and noise-dominated light curves, even in sectors with few or no known TOIs. These findings highlight the capability of unsupervised learning to recover known exoplanetary signals while simultaneously identifying new candidate-rich regions within the data. The proposed framework offers a scalable and data-driven approach for prioritizing targets in large survey datasets, contributing to the advancement of automated exoplanet detection pipelines.

Mark Helpful

Bookmark

Relay

View Full Paper