Preprint 2 of a study of diacritic restoration (šišana/dešišavanje) in the Bosnian–Croatian–Serbian (BCS) Latin standards. Comparing three independently compiled standard lexicons, it shows that the diacritization of a stripped form is standard-independent for more than 99. 5% of the lexicon: standard differences surface as different stripped forms, not as different diacritizations of the same form. The result is corroborated by the minimal foreign-language contamination of the shared lexicon and by the structural asymmetry of detecting the standard from text, and is supported by an independent Universal Dependencies benchmark against the corpus-trained REDI tool, on which a compact (~12. 8 MB) offline dictionary restorer reaches near-ceiling accuracy at the lowest false-positive rate — a different point on the accuracy/footprint trade-off rather than a claim of superior accuracy. A native-speaker validation study is outlined. This deposit contains the article in two language versions: English (preprint2deposit. pdf, with full data appendices) and a neutral Serbo-Croatian/BCS version (preprint2depositₕbs. pdf). Full derived data lists are reproducible from open frequency sources via the included pipeline and are available from the author on request.
Ilya V. Osipov (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: