What question did this study set out to answer?

The aim is to automate the extraction and classification of observational data from GCN Circulars using large language models.

March 2, 2026Open Access

Large Language Model–driven Analysis of General Coordinates Network (GCN) Circulars

Key Points

The aim is to automate the extraction and classification of observational data from GCN Circulars using large language models.
Utilized large language models to parse transient reports from GCN Circulars.
Developed a neural topic modeling pipeline for clustering and summarization of astrophysical topics.
Implemented a system to extract gamma-ray burst redshift information without training.
Evaluated the system's accuracy against the Neil Gehrels Swift Observatory GRB table.
Achieved 97.2% accuracy for redshift-containing Circulars using the Mistral model.
Successfully retrieved 96.8% of redshift Circulars from a manually curated archive.
Demonstrated the effectiveness of automated text mining in astrophysical research.

Abstract

Abstract The General Coordinates Network (GCN) is NASA’s time-domain and multimessenger alert system. GCN distributes two data products: automated “Notices” and human-generated “Circulars” that report the observations of high-energy and multimessenger astronomical transients. The flexible and nonstructured format of GCN Circulars, comprising more than 40,500 Circulars accumulated over three decades, makes it challenging to manually extract observational information, such as redshift or observed wave bands. In this work, we employ large language models (LLMs) to facilitate the automated parsing of transient reports. We develop a neural topic modeling pipeline with open-source tools for the automatic clustering and summarization of astrophysical topics in the Circulars archive. Using neural topic modeling and contrastive fine-tuning, we classify Circulars based on their observation wave bands and messengers. Additionally, we separate gravitational-wave event clusters and their electromagnetic counterparts from the Circulars archive. Finally, using the open-source Mistral model, we implement a system to automatically extract gamma-ray burst (GRB) redshift information from the Circulars archive, without the need for any training. Evaluation against the manually curated Neil Gehrels Swift Observatory GRB table shows that our simple system, with the help of prompt-tuning, output parsing, and retrieval augmented generation (RAG), can achieve an accuracy of 97.2% for redshift-containing Circulars. Our neural search-enhanced RAG pipeline accurately retrieved 96.8% of redshift Circulars from the manually curated archive. Our study demonstrates the potential of LLMs to automate and enhance astronomical text mining and provides a foundational work for future advances in transient alert analysis.

Large Language Model–driven Analysis of General Coordinates Network (GCN) Circulars

Key Points

Abstract

Cite This Study

Also Consider

Also Consider