October 11, 2012Open Access

Twitter Archives and the Challenges of "Big Social Data" for Media and Communication Research

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Lists and Social MediaLists have long been an ordering mechanism for computer-mediated social interaction. While far from being the first such mechanism, blogrolls offered an opportunity for bloggers to provide a list of their peers; the present generation of social media environments similarly provide lists of friends and followers. Where blogrolls and other earlier lists may have been user-generated, the social media lists of today are more likely to have been produced by the platforms themselves, and are of intrinsic value to the platform providers at least as much as to the users themselves; both Facebook and Twitter have highlighted the importance of their respective “social graphs” (their databases of user connections) as fundamental elements of their fledgling business models. This represents what Mejias describes as “nodocentrism, ” which “renders all human interaction in terms of network dynamics (not just any network, but a digital network with a profit-driven infrastructure). ”The communicative content of social media spaces is also frequently rendered in the form of lists. Famously, blogs are defined in the first place by their reverse-chronological listing of posts (Walker Rettberg), but the same is true for current social media platforms: Twitter, Facebook, and other social media platforms are inherently centred around an infinite, constantly updated and extended list of posts made by individual users and their connections. The concept of the list implies a certain degree of order, and the orderliness of content lists as provided through the latest generation of centralised social media platforms has also led to the development of more comprehensive and powerful, commercial as well as scholarly, research approaches to the study of social media. Using the example of Twitter, this article discusses the challenges of such “big data” research as it draws on the content lists provided by proprietary social media platforms. Twitter Archives for ResearchTwitter is a particularly useful source of social media data: using the Twitter API (the Application Programming Interface, which provides structured access to communication data in standardised formats) it is possible, with a little effort and sufficient technical resources, for researchers to gather very large archives of public tweets concerned with a particular topic, theme or event. Essentially, the API delivers very long lists of hundreds, thousands, or millions of tweets, and metadata about those tweets; such data can then be sliced, diced and visualised in a wide range of ways, in order to understand the dynamics of social media communication. Such research is frequently oriented around pre-existing research questions, but is typically conducted at unprecedented scale. The projects of media and communication researchers such as Papacharissi and de Fatima Oliveira, Wood and Baughman, or Lotan, et al. —to name just a handful of recent examples—rely fundamentally on Twitter datasets which now routinely comprise millions of tweets and associated metadata, collected according to a wide range of criteria. What is common to all such cases, however, is the need to make new methodological choices in the processing and analysis of such large datasets on mediated social interaction. Our own work is broadly concerned with understanding the role of social media in the contemporary media ecology, with a focus on the formation and dynamics of interest- and issues-based publics. We have mined and analysed large archives of Twitter data to understand contemporary crisis communication (Bruns et al), the role of social media in elections (Burgess and Bruns), and the nature of contemporary audience engagement with television entertainment and news media (Harrington, Highfield, and Bruns). Using a custom installation of the open source Twitter archiving tool yourTwapperkeeper, we capture and archive all the available tweets (and their associated metadata) containing a specified keyword (like “Olympics” or “dubstep”), name (Gillard, Bieber, Obama) or hashtag (#ausvotes, #royalwedding, #qldfloods). In their simplest form, such Twitter archives are commonly stored as delimited (e. g. comma- or tab-separated) text files, with each of the following values in a separate column: text: contents of the tweet itself, in 140 characters or less toᵤserᵢd: numerical ID of the tweet recipient (for @replies) fromᵤser: screen name of the tweet sender id: numerical ID of the tweet itself fromᵤserᵢd: numerical ID of the tweet sender isoₗanguagecode: code (e. g. en, de, fr,. . . ) of the sender’s default language source: client software used to tweet (e. g. Web, Tweetdeck,. . . ) profileᵢmageᵤrl: URL of the tweet sender’s profile picture geoₜype: format of the sender’s geographical coordinates geocoordinates₀: first element of the geographical coordinates geocoordinates₁: second element of the geographical coordinates createdₐt: tweet timestamp in human-readable format time: tweet timestamp as a numerical Unix timestampIn order to process the data, we typically run a number of our own scripts (written in the programming language Gawk) which manipulate or filter the records in various ways, and apply a series of temporal, qualitative and categorical metrics to the data, enabling us to discern patterns of activity over time, as well as to identify topics and themes, key actors, and the relations among them; in some circumstances we may also undertake further processes of filtering and close textual analysis of the content of the tweets. Network analysis (of the relationships among actors in a discussion; or among key themes) is undertaken using the open source application Gephi. While a detailed methodological discussion is beyond the scope of this article, further details and examples of our methods and tools for data analysis and visualisation, including copies of our Gawk scripts, are available on our comprehensive project website, Mapping Online Publics. In this article, we reflect on the technical, epistemological and political challenges of such uses of large-scale Twitter archives within media and communication studies research, positioning this work in the context of the phenomenon that Lev Manovich has called “big social data. ” In doing so, we recognise that our empirical work on Twitter is concerned with a complex research site that is itself shaped by a complex range of human and non-human actors, within a dynamic, indeed volatile media ecology (Fuller), and using data collection and analysis methods that are in themselves deeply embedded in this ecology. “Big Social Data”As Manovich’s term implies, the Big Data paradigm has recently arrived in media, communication and cultural studies—significantly later than it did in the hard sciences, in more traditionally computational branches of social science, and perhaps even in the first wave of digital humanities research (which largely applied computational methods to pre-existing, historical “big data” corpora) —and this shift has been provoked in large part by the dramatic quantitative growth and apparently increased cultural importance of social media—hence, “big social data. ” As Manovich puts it: For the first time, we can follow the imaginations, opinions, ideas, and feelings of hundreds of millions of people. We can see the images and the videos they create and comment on, monitor the conversations they are engaged in, read their blog posts and tweets, navigate their maps, listen to their track lists, and follow their trajectories in physical space. (Manovich 461) This moment has arrived in media, communication and cultural studies because of the increased scale of social media participation and the textual traces that this participation leaves behind—allowing researchers, equipped with digital tools and methods, to “study social and cultural processes and dynamics in new ways” (Manovich 461). However, and crucially for our purposes in this article, many of these scholarly possibilities would remain latent if it were not for the widespread availability of Open APIs for social software (including social media) platforms. APIs are technical specifications of how one software application should access another, thereby allowing the embedding or cross-publishing of social content across Websites (so that your tweets can appear in your Facebook timeline, for example), or allowing third-party developers to build additional applications on social media platforms (like the Twitter user ranking service Klout), while also allowing platform owners to impose de facto regulation on such third-party uses via the same code. While platform providers do not necessarily have scholarship in mind, the data access affordances of APIs are also available for research purposes. As Manovich notes, until very recently almost all truly “big data” approaches to social media research had been undertaken by computer scientists (464). But as part of a broader “computational turn” in the digital humanities (Berry), and because of the increased availability to non-specialists of data access and analysis tools, media, communication and cultural studies scholars are beginning to catch up. Many of the new, large-scale research projects examining the societal uses and impacts of social media—including our own—which have been initiated by various media, communication, and cultural studies research leaders around the world have begun their work by taking stock of, and often substantially extending through new development, the range of available tools and methods for data analysis. The research infrastructure developed by such projects, therefore, now reflects their own disciplinary backgrounds at least as much as it does the fundamental principles of computer science. In turn, such new and often experimental tools and methods necessarily also provoke new epistemological and methodological challenges. The Twitter API and Twitter ArchivesThe Open

Me gusta

Guardar

Ver artículo completo

Cite This Study

Burgess et al. (Thu,) studied this question.

synapsesocial.com/papers/6a1c3e391567d2fc4d5fd16b https://doi.org/https://doi.org/10.5204/mcj.561

Me gusta

Guardar

Ver artículo completo