The paper considers methods and technologies of automated development of declarative tools for the system of semantic analysis of scientific and technical documents by analyzing the lexical and conceptual-terminological composition of scientific and technical texts in the VINITI RAS databases. Using methods of phraseological conceptual text analysis (PCTA), statistical analysis of the lexical and conceptual-terminological composition of scientific and technical texts across a wide range of subject-areas, and the principle of linguistic analogy, statistical data on the frequency composition of industry-specific text corpora have been compiled. These data became the basis for the automated development of a set of declarative tools for morphological, conceptual, and semantic-syntactic analysis of polythematic texts. Based on statistical data from frequency dictionaries of word forms and standard word forms, the composition and scope of dictionaries that ensure maximum coverage of industry-specific text corpora have been identified. The algorithms for creating a set of industry-specific dictionaries are developed using the authors’ own tools which enable their construction with minimal effort. Based on research findings, industry-specific declarative tools have been developed for the morphological, conceptual, and semantic-syntactic analysis of polythematic texts. The results of the research and dictionary development will be used to automate key technological tasks at VINITI RAS, such as document classification, coordinate indexing, abstracting, and semantic search, as well as for future-oriented tasks involving the creation of industry-specific ontologies and knowledge bases.
Kan et al. (Wed,) studied this question.