The OpenAIRE graph contains a large citation graph dataset, with over 200 million publications andover 2 billion citations. The current graph is available as a dump with metadata which uncompressedtotals 2. 5TB. This makes it hard to process on conventional computers. To make this networkmore available for the community we provide a processed OpenAIRE network which is downscaledto 32GB, while preserving the full graph structure. Apart from this we offer the processed data invery simple format, which allows further straightforward manipulation. The files are: publications. csv - The nodes in the citation graph citations. csv - The edges in the citation graph publicationₗarge. csv - The nodes, but with several fields for additional features. All files are compressed (. xz) files. The fields in the publicationₗarge. csv: Field Explanation Memory usage (GB) nodeId Unique internal identifier for the node (publication) 2 openaireId Identifier assigned by the OpenAIRE platform 18 doi Digital Object Identifier of the publication 13 title Title of the publication 28 authors List of authors associated with the publication 20 description Abstract or short description of the publication 192 date Date when the publication was published 11 container Journal, conference, or repository where it was published 13 citations Number of times the publication has been cited 2 language Language in which the publication is written 10 The pipeline used to produce the data is found in the pipeline. tar. xz file. It is also found here: link
Skarding et al. (Thu,) studied this question.