What question did this study set out to answer?

The aim is to develop a compact language model tailored for STEM education that maintains accuracy while being resource-efficient.

March 21, 2026Open Access

Nano large language model of 125 million-parameter for STEM education

Key Points

The aim is to develop a compact language model tailored for STEM education that maintains accuracy while being resource-efficient.
Developed a 125-million-parameter Nano LLM based on a Transformer architecture.
Trained using knowledge distillation on a custom corpus of B.Tech CSE curriculum data.
Employed model compression techniques to enhance performance despite small size.
Conducted evaluations against other established models such as GPT-2 small and BERT-base.
Achieved 72.8% accuracy in in-domain queries, outperforming similar-scale models.
Demonstrated effective coherence and accuracy, supported by tailored training data.
Successfully ran on consumer-grade hardware, indicating applicability for offline use.

Abstract

Large language models (LLMs) have achieved remarkable success on various natural language tasks, but their immense size often makes deployment on resource-constrained devices impractical. This paper presents Lightweight Nano LLM, a compact 125-million-parameter Transformer-based language model tailored for domain-specific question answering. Built on a GPT-2 small architecture, the proposed model is trained via knowledge distillation on a custom B. Tech CSE curriculum corpus (3 million tokens) to inject deep-domain knowledge. The study emphasizes techniques from prior literature, such as model compression and curated training data that enable small models to punch above their weight in coherence and accuracy. In evaluations, Lightweight Nano LLM demonstrates near-perfect accuracy on in-domain queries, outperforming other models of similar scale (e. g. GPT-2 small, DistilGPT-2, BERT-base) in this specialized task. However, the proposed model has outperformed all the above compared models with 72. 8\% accuracy. The model’s compact size and focused training also allow it to run efficiently on consumer hardware (e. g. Apple M1), highlighting the promise of small LLMs for personalized and offline applications. The proposed work presents a detailed literature review, comparative analysis with related models, and an architectural diagram. The results show that with appropriate training data and design, smaller truly can be smarter in specialized settings for STEM disciplines.

Bookmark

View Full Paper

Cite This Study

Pradhan et al. (Tue,) studied this question.

synapsesocial.com/papers/69be34f26e48c4981c67323b https://doi.org/https://doi.org/10.1007/s10791-026-10033-z

Bookmark

View Full Paper