The Ethnological Museum in Berlin houses one of the largest ethnological collections in Germany. It was closely tied to the political structures of the German Reich and served as the central hub for cultural objects from German colonies. The museum data reflect this colonial origin and challenge digital approaches. In the project Uncertainties in the Archives, I attempted to map the museum’s collection areas. This included obtaining geographical coordinates from online databases and mapping the results as a heatmap using QGIS.In doing so, I encountered lots of uncertainties, ambiguities, and inconsistencies in the data. Geographic descriptions proved to be especially challenging to deal with: They are ambiguous and noisy. In some cases, the same name referred to different places, like Bali, which is an island in Indonesia as well as a neighbourhood in modern Cameroon, whereas in other cases place names changed over time or across languages (Wrocław and Breslau) or include markers of uncertainty (Berlin?).Geographical databases like GeoNames or Wikidata struggle to reliably identify colonial and historical place names and thus introduce technical biases. This leads to the exclusion of noisy information, which effects the analysis and final visualization and makes data points invisible.Instead of noisy data, which includes the notion of information that can be ignored, I prefer to use the term dirty data coined by C. Lemercier to better describe the complexity of humanities data. The process of data cleaning seemingly removes noise, but actually removes valuable information about collection practices and uncertainties. The identification of these biases and uncertainties in the data and the digital processes is essential for a critical evaluation. The example of the Ethnological Museum in Berlin shows typical challenges faced when working with noisy data in the GLAM sector.
Firmin Forster (Tue,) studied this question.