The Geo Big Data Open Platform operated by the Korea Institute of Geoscience and Mineral Resources (KIGAM) integrates diverse geoscience resources, including geological maps, thematic maps, reports, metadata, and frequently asked questions, and provides public access through web services and open APIs. Despite this rich content, practical access through natural-language queries remains limited because most records are organized around structured metadata and domain-specific terminology. This study presents GeoBot, a conversational artificial intelligence service for the platform, and describes the construction of a structured Korean question-answer (QA) dataset designed to support semantic retrieval, geospatial filtering, and retrieval-augmented response generation. The retrieval corpus was assembled from open platform metadata, geo data, paper metadata, report metadata, and FAQ content, while the released QA records were derived from 1, 200 real user queries collected during the beta service. The workflow comprised source-data collection, schema validation, metadata normalization, coordinate transformation, sentence embedding, vector indexing, similarity retrieval, and large language model-based answer generation. During preprocessing, fragmented metadata records were merged into document-level objects, coordinate strings were converted into Geo-JSON point or polygon objects in WGS84, and normalized text fields were embedded using a Korean sentence-embedding model based on KoSimCSE-RoBERTa. The resulting documents were indexed in Elasticsearch using denseᵥector and geoₛhape fields to enable joint evaluation of semantic similarity and spatial intersection. The final dataset contains 1, 200 structured Korean QA records, including raw questions, refined questions, keywords, evidence-grounded answers, spatial information, and source identifiers. Expert evaluation showed a factual consistency rate of 96. 83%, demonstrating the dataset's reliability for geoscience conversational services.
Han et al. (Tue,) studied this question.