Key points are not available for this paper at this time.
Studying a changing world requires observations going back in time to extend and contextualize our latest scientific knowledge. Old legacy data exist in non-digital formats. Thus, techniques and methodologies for the preservation, dissemination, interpretation, homogenization, calibration, and use of such legacy data and their associated metadata, as well as for their present scientific use are important topics for advancing our understanding of the changing Earth and of past extreme events. The articles presented in this special issue review different issues involved in these diverse topics, including the importance of preserving old data and metadata, the actors involved in the task, the problems in converting them to digital files and databases, as well as to point some hints for the future. Young researchers are used to working with digital data and accessing the databases where they are stored. However, digital data are a new paradigm valid only for the last 20–30 years, and even then, not in all locations or for all data types. Many data collection, handling and interpretation methods for analog data, in paper or tape format, are documented by senior scientists and preserved in laboratories, libraries and archives (but are not published or available for general consultation). Digitizing the large amount of analog data acquired in the past looks to be the best and easiest way to access and save them for the future. This is not a simple task. Transforming the old data to digital format, accessible under FAIR principles, requires long preprocessing to get them into machine-readable and analysable form. For these reasons and to collect different examples of these processes useful for the research community, we proposed a special issue devoted to these topics. It was accepted and we now present the result of this effort. We take this opportunity to review the different issues involved in these diverse topics, including the importance of preserving old data and metadata, the actors involved in the task, the problems in converting them to digital files and databases, as well as to point some hints for the future. Our preface is therefore longer than usual and reviews in detail the themes brought to light by the published articles. Studies in the geoscience need data acquired from the natural world. Studying a changing world requires observations going back in time to extend and contextualize our latest scientific knowledge. Moreover, reanalysis of old geoscience data, ranging from deep Earth interiors to the solar-terrestrial environment, in the light of our present knowledge, has become an important tool for understanding topics spanning from solar and geomagnetic variability to climatic change, tectonics or earth rotation. Past extreme natural events, including magnetic storms, hurricanes, rainfall, floods and earthquakes, can also be analysed in unprecedented detail and depth using historical records. Old records are extremely diverse. They may include records ranging from numerical or graphical observations consciously acquired to record particular phenomena to historical documents not originally intended for scientific purposes at all. A common characteristic of these preserved historical observations and records that have been directly acquired by humans (e.g., excluding geological, natural or proxy records) is that they are usually found in non-digital forms and are often contained in unique analog documents, such as written documents, graphs, etchings and drawings, instrument recordings such as seismograms, barograms, photographs, or even architectural structures. Historical information useful for geoscience studies may also be retrieved from documentary evidence such as narrative sources and legal-administrative institutional documentation (e.g., chronicles, newspapers, private and official protocols and correspondence, account books) not originally intended for use in scientific studies. Thus, the historical documents used to acquire scientific knowledge can include numerical data, graphical and visual records, written descriptions and illustrations that are coded into categorical data. But legacy records are in danger. Within the fields of meteorology and hydrology alone, and despite the efforts of the WMO and other institutions (I-DARE, 2017), it is estimated that the world loses over 500,000 historic hydrometeorological observations daily due to factors such as loss due to fire, flood, or deterioration of the medium on which the observations are stored (i.e., paper, microfiche/microfilm, magnetic tape, glass, etc.). The astrophysical community has also pointed the urgent need for data preservation and digitization, as it is known that even early digital datasets are exposed to a risk of loss as the methods for recording and calibrations become obsolete (Pevtsov et al., 2019). Custodians of the data can be unaware of the value of, and the need to preserve, these data and their metadata. Once gone, these data are lost forever. This fact holds also for early digital data, as exemplified with the experience of Nagihara et al. (2018) for heat floor lunar data. Legacy data, their preservation and dissemination are important for the progress of Earth System Science. Techniques and methodologies for the preservation, dissemination, interpretation and homogenization of legacy data and metadata, as well as for their present scientific use, are important topics for the future advancement of our understanding of the changing Earth and of past extreme events. Different approaches have been devised to deal with different legacy data and the specific problems they pose. The recording and preservation of 'metadata' are as important as the data themselves, especially for old data and for future generations of users as, even when the data themselves have been acquired and preserved, in many cases metadata such as the instrument details or calibration methods have not always been fully described as this knowledge was assumed to be common knowledge. Just some decades later, this knowledge is lost when the instruments are decommissioned. For these reasons, this GDJ special issue has been intended to share accumulated experiences with legacy data in the different fields of geosciences, including the difficulties and obstacles faced in collecting and producing historical geoscience datasets. Its genesis was at the JS06 session "Old Data for new knowledge: Preservation and Utilization of Historical Data in the Geosciences", held at the 2019 IUGG General Assembly, in Montreal, which brought together geoscientists from a range of disciplines to discuss "Old Data for New Science". Our intention has been assembling a range of contributions, from data papers covering historical geoscience datasets with descriptions of their metadata to data services papers which describe best practices, while including papers discussing the development of system technique tools to address the topics of data collection, sharing, analysis, revisions, instrumental details and data visualization in the historical geosciences. Our present-day records are acquired in digital form. Long gone are the days of handwritten observations, analogue strip charts or any kind of paper printout at leading institutions. Most records are stored in digital databases that are easy to access, with known standard formats that are shared among the scientific community. In most cases, they also have better dynamic range and resolution than ever before. Why, then, are we so interested in old data for new science? One key point is that our old data records are the only ones we have that recorded actual events, rather than (for example) model runs, so they record the true variability and extremes of our geophysical environment. Measured and recorded data thus reveal unexpected details and can allow us to test and occasionally re-evaluate our models and theories. In the geo and space sciences (the traditionally called "observational sciences" Okasha, 2011), we can only observe in real time. This is why historical records are so valuable: they extend back in time our records and knowledge, in some cases by centuries, and allow us to capture more events and analyse them in more detail. Nevertheless, the importance of preservation and correct use of old data is not a new topic. Their utility was being demonstrated as early as in the XVII Century when Henry Hellibrand used previous observations to show the variation of geomagnetic declination with time (Hellibrand, 1635). Today, in the modern digital era, a lot of work has been done in different fields of the geosciences and examples of using old data for modern geophysical research abound. Among them, Giménez et al. (1996) show as precision levelling data acquired for civil work, more than a century ago, are now useful for the study of surface neotectonics/present day deformation. Also, the use of vectorized analogue seismograms for the study of old earthquakes has been fruitfully attempted by many authors (e. g. Pino et al., 2000; Wald et al., 1993) and its strengths and limitations approached in a more general way (Batlló et al., 2008). This is also the case with geomagnetic measurements (Beggan et al., this issue). Historical documents originally not intended as scientific records allow us to derive new scientific knowledge decades or centuries later – in combination with our contemporaneous scientific observations and modern numerical models. One well-known example is from the Carrington storm in September 1859, one of the greatest and most known space weather events in the last 2 centuries (Hayakawa et al., 2019). Their legacy data have been recently reanalyzed for geomagnetic disturbances at Rome in Italy (Blake et al., 2020) and Colaba in India (Hayakawa, Nevanlinna, et al., 2022), to allow us to develop new scientific discussions for forgotten details on this "greatest" space weather event. The preservation and use of legacy records allow for new and innovative studies not envisaged when the data were originally obtained. Ben-Menahem (1975) used seismograms to investigate the 1908 meteor air burst at Tunguska. Krüger et al. (2018) used imprints of such earthquakes in magnetograms to investigate focal parameters of Central Asia events. We can also see this with historical accounts of solar eclipses, astronomical spectacles that have been recorded for millennia. Centuries or even millennia later, these records offer unique base references, that had not been intended by the contemporaneous observers, to evaluate the Earth's rotation variability on the centennial time scale (Hayakawa, Murata, Morrison et al., 2021; Stephenson et al., 2016) or to study the solar coronal activity in the grand solar minima (Hayakawa et al., 2021). Meteorology deserves special attention as instruments to measure temperature, pressure and humidity were first invented in the XVII century and the use of historical observations to examine climate change and variability has been known since the XVIII century (e.g., Mann, 1792). Data use and reuse has been constant over the centuries in the meteorological, oceanographic and climatological communities, with scientists and researchers such as Kelly (1837), Dove (1840), Renou (1885), Jones et al. (1982). Meteorology has a long-established tradition on the acquisition and management of long data series (Riishojgaard et al., 2021). Data exchange and networks with concomitant data standards were also first established in the XVII century with the French Société Royale des Médecins (Kington, 1980) and the Palatine Meteorological Society of Mannheim (Cassidy, 1985). International data exchange was formally established in the mid XVIII century and has grown continually more sophisticated ever since. In the XXI century, international efforts have gone towards producing databanks of rescued historical data such as the International Comprehensive Ocean–Atmosphere Data Set (ICOADS; Freeman et al., 2017) or the International Surface Pressure Databank (ISPD; Cram et al., 2015). These observations are also used as input in reanalysis projects which provide instrument based reanalyzed data processed by weather forecasting models as, e.g., the ECMWF full atmosphere reanalysis project ER-A20C (1900-2010, Poli et al., 2016). NOAA's 20th Century Reanalysis Project (20CR) uses surface pressure data provided by historical data rescue projects from around the world to produce historical reanalysis meteorological data around the globe at 3-h intervals from 1836 to 2015 (Slivinski et al., 2021). The initiatives of regional or national data rescue projects, global data banks and the bringing together of the disparate data records into an integrative project such as the 20th Century Reanalysis have been brought together under the umbrella of the international grouping of the Atmospheric Circulation Reconstructions over the Earth (ACRE) initiative (Allan et al., 2011). For nearly two decades, researchers have been meeting annually to discuss projects, progress in data rescue, common goals and have striven to make rescued data more accessible. Standards have been developed for metadata, file sharing, including standardized file exchange formats, and file naming conventions. As previously mentioned, not only are there centuries' worth of observations in an almost unimaginable varieties of formats still to rescue, but previously digitized observations also exist in a wide variety of formats and media, and themselves need to be reconfigured. We shall point here that, even immersed in the digital era, not all data are acquired in digital form. Small observatories and networks with limited resources, or those in developing countries, are still using analogue instruments, manual recording and strip charts as official data record. In some countries, such records are not immediately accesible for external scientists despite their importance to climate models (Nordling, 2019). Their records are important and deserve the same care as we are asking for the legacy data. Finally, let us point out that scientific data rescue is also an inherently multidisciplinary task. As with all historical records, we work closely with archivists as well as with the librarians and data scientists involved in any major data undertaking. However, all scientific processes take place in specific social and cultural context. With historical data, we need to go a step further to understand the historical environmental, social and cultural, as well and the scientific, context in which the data collection was taking place. Furthermore, the data itself can often contain information of cultural or social importance. The names, social context and conditions of the people recording the information: were they scientists, technicians, students, military, civilian, servants, officers or enlisted soldiers? Further afield, did they record other environmental or sociological conditions, such as famine? Ships' logs, in addition to containing meteorological and oceanographic data, can include ship cargos, crew identification, contact with other vessels either friendly or adversarial, port and custom's information, medical information on the health of the crew or diseases prevalent in various ports and other sociological information. Rescuing geoscientific observations can also be an opportunity to explore socio-historic conditions (e.g., Allan et al., 2016). The collected articles in this special issue offer a wide view on the different issues the preservation, distribution and use of old legacy data pose. While there is much thematic overlap among the articles, we have grouped them here into four major themes. These are: digitization of historical observations; data review from recently closed observatories; data portals and data applications. Under the theme of digitization of historical observations, Burt (this issue, p. 3–17) –A twice-daily barometric pressure record from Durham Observatory in north-east England, 1843–1960, gives an excellent overview of the painstaking and detailed processes involved in recovering historical meteorological observations. He describes in detail the process of recovering the barometrical observations from the Durham Observatory in the UK, making them digitally available, transforming them into modern units and verifying the accuracy of the recovered data. Hejda et al. (this issue, p. 39–44) – Magnetic storm and term-day observations at the Prague observatory Clementinum in the mid-19th century – in their discussion of magnetic observations taken at the Prague Observatory from 1839 to 1849, describe the instruments, observing conditions and issues with unit conversions in historical geomagnetic observations. These 19th century geomagnetic observations form unique references for historical geomagnetic storms at the time and could hold important keys to 21st century problems with space weather such as solar storms impacting our technology-dependent societies. Beggan et al. (this issue, p. 73–86) – Digitizing UK analogue magnetogram records from large geomagnetic storms of the past two centuries – show the British Geological Survey (BGS) efforts for digitization of the UK magnetograms and make the point that challenges posed by constant changes of technology over time and historical journey of each particular observatory make it impossible to have a "one size fits all" approach to capturing and modernizing historical geomagnetic records. The experience gained from each project can be shared, however, to work towards developing a generalized, albeit flexible, framework. Quality control and error checking are crucial components of the historical data recovery process. Hayakawa et al. (this issue, p. 87–98) – Sunspot observations at Kawaguchi Science Museum: 1972–2013 – documented sunspot observations in Kawaguchi Science Museum in 1972–2013 and describe how the work of a single dedicated observer can have a major impact on sunspot number recalibrations, by providing consistent and homogenous observations. Luckily, this observer's record was properly preserved in Kawaguchi Science Museum. However, such data, no less than official governmental or observatory records, are also in need of data rescue; perhaps all the more so as they are at greater risk of being overlooked and lost. Nagamachi et al. (this issue, p. 45–62) – Historical data of atmospheric electric field observations in Japan – on their research about the collecting of atmospheric electricity readings at two observatories in Japan (Kakioka in 1929–2021 and Memembetsu in 1950–2010) highlight the importance of knowing the station history and metadata when interpreting past data. Knowledge of the site characteristics and information on past data-gathering practices help put older observations in context. Nagamachi et al. (this issue, p. 18–38) – Historical data of geoelectric field observations in Japan – deals with geoelectric field measurements at Kakioka (1932–2023), Memembetsu (1949–2021), and Kanoya (1948–2021). The authors explore the geoelectric field measurements designed to measure the Earth's currents developed over the 20th century in Japan, both in terms of the instrumentation and the data recording. The paper demonstrates how the digitization of records and the reporting of historical developments in data recording and innovation go hand-in-hand. Dismayingly, the three major observatory clusters stopped geoelectric field measurements in 2021, despite their scientific importance. Shimojo and Iwai (this issue, p. 114–129) – Over seven decades of solar microwave data obtained with Toyokawa and Nobeyama Radio Polarimeters – provide a historical overview of radio observations in Japan, the challenges posed from increasing human use of radio frequencies, and a detailed description of the digitization project. The data from their digitization project is thoroughly described, with discussion on the formats into which the end data and metadata is exported, and many programming languages with which the data is compatible. Curto et al. (this issue, p. 99–113) – Service of rapid magnetic variations, an update – give an overview of how communicating data, and particularly solar storm alerts, has changed over time in the database of the International Service on Rapid Magnetic Variations at Ebro Observatory. Communication of potentially hazardous events has both shaped the provided services and itself been shaped by the technological changes. Its database has even been reinforced with algorithms for automatic detection in several key observatories as a complement of human force to process data. Together with the internet facilities, this recent improvement speeded up the whole process of events determination and made its database more reliable and accessible. Tanaka et al. (this issue, p. 130–141) – Advanced tools for guiding data-led research processes of Upper-Atmospheric phenomena – show that data, whether historical or otherwise, are of little use if researchers cannot find, or easily interpret the necessary metadata to properly determine what the data represents. This paper describes IUGONET Type-A, a tool developed specifically as an upper atmosphere data service. IUGONET Type-A includes data visualization and other tools in its data cataloguing service. Ichino and Masuda (this issue, p. 63–72) – Rekiske: Interdisciplinary platform for sharing knowledge and experience of Japanese historical documents – show how interdisciplinarity, interoperability and data availability are also important to organizing historical data and making them available. They look at how historical documentary data are used in a range of disciplines and bring data together from a wide variety of sources in a single application for scholars to contribute, locate, use and improve upon known datasets. Hayakawa et al. (this issue, p. 142–157) – A review for Japanese auroral records on the three extreme space weather events around the International Geophysical Year (1957–1958) – brings together both observatory records and citizen science observations to increase documentation on auroral extensions upon solar storms and reconstructed equatorward boundaries of the auroral oval for each major event. This paper highlights one the most critical facets of historical data that is needed to create new knowledge: that they are our only source for rare, but high impact geophysical events, whether they be solar, tectonic or climatological in origin. Corona-Fernandez and Santoyo (this issue, p. 178–192) – Re-examination of the 1928 Parral, Mexico earthquake (M6.3) using a new multiplatform graphical vectorization and correction software for legacy seismic data – develop an open-source application to digitize scanned seismograms. Developing shared and freely available software to digitize and correct historical paper records, tailored towards each particular type of record, is an enormous step forwards in the daunting task in making past records available for the urgent task of furthering our understanding of our terrestrial and spatial environment. Gomes et al. (this issue, p. 178–192) – The importance of scientific data and historical heritage of the geophysical and astronomical observatory of Coimbra university for the study of geophysical sciences – speak to the importance not only of data and physical knowledge but also of the social and cultural importance of the history of science and accumulation of knowledge. Science is a progressive endeavour, but it is one built on a foundation of accumulated knowledge. Without that foundation, we have nothing on which to build future knowledge. This observatory offers an example of a center curating a large amount of historical data but with difficulties putting them online as a recognized repository. Instead, specific series may be found at larger recognized repositories. Some common themes and challenges emerge from this special collection. Issues such as changes in standards over time, changes in technology, struggles to convert to common units and inter-operability between different measuring, recording and information transmission data storage are mentioned in nearly all the papers. Other concerns that are mentioned less frequently and not directly related with data repositories, but that are still causes for serious concern, include the way increasing human interference, in fields as diverse as radar for solar monitoring, ground electrical currents or ground seismic and meteorological observations, affect the homogeneity of records, or in some cases the ability to keep monitoring the geophysical or space weather environment at all. Changing technologies and data recording methods also affect long term data homogeneity. The publication, standards and sharing of the rescued data is also something authors have mentioned as a struggle. While GDJ has championed the publication of historical datasets, there are still some barriers in the way. We notice, for example, that not all the institutes have the administrative and financial resources to assign DOIs to every scientific dataset. This policy may potentially disturb data rescue outside of large institutes in developed countries. Another issue is data access. When data or historical records are derived or acquired from external archives (as national or city archives, not specifically devoted to scientific data), researchers need to follow the policies of the record or data holders. Usage terms and conditions are diverse and some archives or records holders do not allow open reuse of documents and data, or do not allow external researchers to obtain/purchase more than a token number of original materials, or may charge significant scanning costs, which is a barrier to researchers to entirely scan all the collections. In some cases their online reproduction is not allowed. In such cases, it becomes impossible for scientists to provide original source materials. The scientific community needs to be aware of such cases and should not require contributors to upload everything online. Finally, the availability of resources is also discussed several times in this collection and is a general topic of concern in the data rescue community. Many papers in this collection point out that the resources needed to curate both analogue observations and digital data, safeguard, transform and make accessible information to a variety of stakeholders are considerable. Currently, not only is our common scientific heritage being lost, but investment in maintaining current geoscientific observations is declining. As this special edition attests, observatories are closing and funding for maintaining observations and observers is declining or being stopped altogether in many parts of the world. Citizen science and crowdsourcing initiatives may alleviate the shrinking of public resources allocated to data preservation and distribution (e.g., Ashcroft et al., 2016; Craig Ishii Lorrey et al., 2022; Sieber Guidoboni for example, compliance with FAIR (findability, accessibility, interoperability and reusability) data principles (Wilkinson et al., 2016), repository or database compatibility and improved definitions for formats and standards for legacy data. Other issues go further than data and databases. Among them, we highlight securing the preservation of the original physical records. It should be clear that the first step towards the use of legacy data should be to rescue/preserve the records, otherwise they will be lost forever. The second step, its proper digitization, requires validation that should be secured by proper procedures and metadata. Finally, an important future topic is to devise procedures to facilitate the access and digitization of data preserved at archives not specialized or acquainted with the needs posed by the research on numerical data, and which have strict policies on the distribution and use of documents. We hope this special issue will help to grow the interest and attention on the highlighted topics and many others that will be publicized and discussed in the future and that the pages of GDJ will contribute to these further developments. As associated editors of this special volume, we would like to thank the authors of the different articles for their willingness to contribute to it and the reviewers for their generous effort. H.H. thanks the ISEE director's leadership fund for FYs 2021–2023. This article has been awarded Open Data Badge for making publicly available the digitally-shareable data necessary to reproduce the reported results. Data is available at Open Science Framework
Building similarity graph...
Analyzing shared references across papers
Loading...
Josep Batlló
Hisashi Hayakawa
Victoria Slonosky
Geoscience Data Journal
McGill University
Nagoya University
International Rescue Committee
Building similarity graph...
Analyzing shared references across papers
Loading...
Batlló et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68e778e0b6db6435876eddb9 — DOI: https://doi.org/10.1002/gdj3.243