September 18, 2024Open Access

Source criticism, bias, and representativeness in the digital age

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Historians must critically scrutinize their sources, a task further complicated in the digital age by the need to evaluate the technical infrastructure of digital archives. This article critically examines digital newspaper archives, revealing error rates in optical character recognition (OCR) that compromise result reliability, and word frequency-based datasets that introduce biases due to issues in the shaping of the OCR corpus and later post-processing. Beyond technical issues, copyright restrictions hinder access to crucial newspapers, while incomplete archives pose representativeness challenges. Accessing datasets from different countries is cumbersome. Commercial archives are costly, and uneven publication rates necessitate corrections over time. The use of digital archives presents new exercises: the researcher needs to explain the reliability of the digital source, which often can only be achieved in interdisciplinary working groups. The digital archives must ensure transparency by detailing to researchers the technical manipulations performed on the original source.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo

Cite This Study

Jørgen Burchardt (Wed,) studied this question.

synapsesocial.com/papers/68e581eab6db64358751f5ef https://doi.org/https://doi.org/10.5617/dhnbpub.11512