What question did this study set out to answer?

To effectively identify disruptive and developmental research using advanced modeling techniques.

January 20, 2026

Utilizing large model for content-based identification of disruptive and developmental research

Puntos clave

To effectively identify disruptive and developmental research using advanced modeling techniques.
Built a content-based identification model using machine learning, deep learning, and large language models.
Utilized the Mistral-7B model fine-tuned via QLoRA.
Evaluated model performance through established datasets.
Achieved an F1 score of 0.7735, outperforming traditional machine and deep learning models.
Successfully distinguished Nobel Prize papers from others.
Demonstrated a strong correlation between predicted scores and future scientific impact.

Resumen

The expeditious identification of papers with potentially disruptive and developmental contributions is a meaningful issue in science of science research. To achieve this aim, we proposed an effective method to automatically build an abstract format-balanced dataset based on the basic idea of disruption index. Subsequently, we respectively utilized a range of machine learning models (MLs), deep learning models (DLs), and large language models (LLMs) to build a content-based identification model based on the dataset. The optimal model, Mistral-7B fine-tuned via QLoRA, significantly outperforms MLs and DLs, and achieved an F1 score of 0.7735 on this extremely challenging task. Hence, the model can promptly and effectively distinguish between disruptive, developmental, and general research based purely on research content. Further, we developed three evaluation datasets to scrutinize the effectiveness of the model, demonstrating its remarkable ability to distinguish Nobel Prize papers from randomly paired counterparts, as well as to differentiate between randomly sampled papers in journals of varying impact factors (IF). Additionally, the papers’ scores derived from our model’s prediction exhibit a strong positive correlation with their future scientific impact. We also reveal that evaluating papers based solely on IF is insufficient, and papers boasting high view counts are more likely to represent potentially disruptive or developmental research. Thus, our work not only provides an effective method for timely identifying high-quality research, but also provides guidance for promoting fair scientific evaluation.

Me gusta

Guardar

Cite This Study

Huang et al. (Sat,) studied this question.

synapsesocial.com/papers/696ed06d6d8d470fca57abab https://doi.org/https://doi.org/10.1007/s11192-025-05513-w

Me gusta

Guardar