What question did this study set out to answer?

The research aims to automate the development of tools for analyzing the semantics of scientific and technical documents effectively.

June 3, 2026

Automation of the Development of a Set of Declarative Tools for a System of Semantic Analysis of Industry-Specific Scientific and Technical Documents

Key Points

The research aims to automate the development of tools for analyzing the semantics of scientific and technical documents effectively.
Analyzed scientific and technical texts from VINITI RAS databases using phraseological conceptual text analysis (PCTA).
Compiled statistical data on lexical and conceptual-terminological compositions across various subject areas.
Developed specialized algorithms for creating industry-specific dictionaries based on frequency data.
Developed declarative tools for morphological, conceptual, and semantic-syntactic analysis of polythematic texts.
Identified dictionary compositions ensuring maximum text coverage for industry-specific corpora.
Enabled automation of document classification and semantic search at VINITI RAS.

Abstract

The paper considers methods and technologies of automated development of declarative tools for the system of semantic analysis of scientific and technical documents by analyzing the lexical and conceptual-terminological composition of scientific and technical texts in the VINITI RAS databases. Using methods of phraseological conceptual text analysis (PCTA), statistical analysis of the lexical and conceptual-terminological composition of scientific and technical texts across a wide range of subject-areas, and the principle of linguistic analogy, statistical data on the frequency composition of industry-specific text corpora have been compiled. These data became the basis for the automated development of a set of declarative tools for morphological, conceptual, and semantic-syntactic analysis of polythematic texts. Based on statistical data from frequency dictionaries of word forms and standard word forms, the composition and scope of dictionaries that ensure maximum coverage of industry-specific text corpora have been identified. The algorithms for creating a set of industry-specific dictionaries are developed using the authors’ own tools which enable their construction with minimal effort. Based on research findings, industry-specific declarative tools have been developed for the morphological, conceptual, and semantic-syntactic analysis of polythematic texts. The results of the research and dictionary development will be used to automate key technological tasks at VINITI RAS, such as document classification, coordinate indexing, abstracting, and semantic search, as well as for future-oriented tasks involving the creation of industry-specific ontologies and knowledge bases.

Bookmark

Automation of the Development of a Set of Declarative Tools for a System of Semantic Analysis of Industry-Specific Scientific and Technical Documents

Key Points

Abstract

Cite This Study