Introduction A teenage girl wishes her friend a happy birthday in German. Two American kids test a camera and upload the results. Colleagues upload a recording of a meeting for the benefit of employees who cannot attend. A man sends a "good morning" message to his extended family in Hindi. These are not participants in the "Creator Economy" using YouTube to – as its tagline suggests – broadcast themselves; these are videos meant for a small audience, sharing banal pieces of everyday life. They are also publicly visible, and more common than you would think. We came across these videos not because they were sent to us but because, for the past few years, we have been studying representative samples of the billions of videos uploaded to YouTube. Though the platform is more commonly understood through its viral successes, professional creators, and algorithmically uplifted content, the median YouTube video receives only 41 views (Zheng et al.). Peering into the less popular side of YouTube reveals a wide range of uses not visible among samples of popular videos – uses that provide a more accurate picture of how the platform fits into people's lives, not just as a medium for broadcasting but as infrastructure for all manner of audio/video communication. These overlooked, everyday uploads complicate assumptions about privacy – that uploaders of public videos understand the scope of their audience and the implications of their own disclosures – raising important ethical questions for researchers, platforms, users, and governments. In this article, we introduce the concept of "accidental vlogs", videos in which people share aspects of their lives with a private or small audience in public. First, we review existing literature on privacy and social media, recontextualised for accidental vlogs and updated to reflect recent developments in the use of YouTube videos to train AI models. Then, taking a critical lesson from our research into differences between linguistic communities on YouTube, we emphasise the importance of cultural specificity when thinking about privacy, using the case of Hindi YouTube. There appear to be more accidental vlogs in Hindi YouTube, which we argue is due to differences in social uses of technology and India’s newer, video-first user base. Accidental vlogs demand renewed, context-sensitive ethical consideration of user privacy at multiple levels, including among researchers. Context and Audience Among the foremost ethical frameworks for thinking about privacy on social platforms is Nissenbaum's contextual integrity, the principle that respect for privacy is respect for appropriate "information flows" (Privacy). As Erving Goffman observed in 1956, people present themselves differently based on context. When information disclosed to friends is shared with our parents or employers, it is a violation of contextual integrity. On the Internet, users can violate each other's contextual integrity, but the concept is more often used to talk about the infrastructure and policies of platforms. danah boyd, for example, described the way the structure of early social networking sites blurred public and private domains of communication and "collapsed contexts" such that a message meant for friends would also be shown to cousins and colleagues and vice versa (2). Audiences are typically obscured on social platforms, so users must imagine one, creating contextual tensions when the imagined audience diverges from the actual audience (Litt). For someone building a brand, the imagined audience may be a broad target market, but for someone else it may just be a single friend. In a Twitter study, Alice Marwick public is simply the default setting, regardless of your imagined audience. The site includes a wide range of privacy tools, policies on harassment, and internal user surveys about privacy, but these just create a "false sense of security", providing enough reassurance for users not to worry about privacy even while their data is used for opaque purposes (Mitchel 8). The term "privacy paradox" describes a much-studied conflict between people's expressed concern for privacy and the reality of their information sharing practices (Barnes; Norberg et al.). Debates commonly focus on whether users' actions are the best gauge of their true attitudes, or whether platforms should provide better mechanisms to allow people to retain privacy while sharing. Daniel Solove argues that businesses that profit by extracting user data cannot provide meaningful privacy controls while also pleasing shareholders and, further, that the paradox is a myth – that managing online privacy is an impossibly complex task no consumer can realistically undertake. Beyond the perceptual limits of users imagining their audiences are structural limits to privacy management. We are sympathetic to this perspective, and would add that much of the research establishing the paradox is skewed towards the culture and technological infrastructure of European and American users. Videos as Data Even when uploaders do understand the extent of their audience, they likely do not grasp the many kinds of incidental data they unknowingly share. A simple video of someone dancing in their home may contain enough information to track down their location, identify their family members and friends, learn about their personal opinions via objects in the background, or a reflection of the screen may even be visible in a mirror on the other side of the room (Kutschera et al.). The popularity of videos filmed in public settings has fuelled debates over the rights of uninvolved bystanders whose likeness and activities are incidentally incorporated into someone else's public video (Wu et al.). Increasingly, the privacy implications of shared data exceed the boundaries of the platform itself. Companies have long extracted user data from social media to sell to advertisers and data brokers (Lamdan). Data which may seem inconsequential at the time of upload can be combined with other forms of data to infer personal details ranging from political opinions to medical diagnoses. It is because of these developments in data analytics and inference that Nissenbaum published an article clarifying that contextual integrity should apply "up and down the data food chain" (Contextual). Even if a user deletes a video, it may be removed from public view, but platforms' terms of use typically have provisions which allow them to retain a copy. Anything shared should be assumed shared permanently. In recent years, user data has found a new use: to train large AI models. Reporters for the New York Times discovered in 2024 that OpenAI and Google had turned to YouTube in their quest to find more and more data to train their language (and probably video) models (Metz et al.). In addition to being permanent, disclosures are subject to any new forms of privacy-compromising technological developments in the future. It would not have been possible for someone uploading a video in 2009 to foresee that even if they achieve a degree of low-view obscurity, AI models do not care about view counts – homework assignments and birthday wishes provide language and video that is just as valuable as MrBeast or T-Series (McGrady and Zuckerman). The extent to which these systems leak personal data is an emerging, unresolved issue, but there are signs that concerns are well founded. Contextual integrity is an inevitable casualty here, and models simply do not possess the requisite social reasoning skills to judge information sensitivity (Mireshghallah et al.). Human Rights Watch, for example, found photographs of identifiable children in a popular training dataset. When we step back and focus on the videos typically neglected by the logics of platform capitalism and popularity, YouTube uploaders are especially susceptible to the tensions between imagined and broadcast audiences. An uploaded public video meant for friends and family can still be seen, shared, repurposed as training data, and researched, regardless of view count. In fact, most of YouTube receives relatively little internal attention. Videos with 10,000 views or more account for 96% of total views on the platform, but constitute just 4% of content (McGrady et al. “Dialing”). Half of all videos – about 7.5 billion as of June 2024 – have 41 views or fewer (Zheng et al.). It is no wonder, then, that many uploaders are content to divulge personal information and aspects of their lives while relying on privacy through obscurity, or perhaps not considering privacy at all. Cultural Specificity and the Case of Hindi YouTube A disproportionate amount of research about YouTube is written in English, published in English-speaking countries, and focusses on European or North American users. Studies of imagined audiences in particular neglect all but a handful of developed countries (Sun et al.). Even as there is a broad understanding that research should attend to cultural differences in media use, it is easy for researchers to assume that the way people use a digital platform in one place or culture is generalisable to the rest of the world (Matassi and Boczkowski). Broad cross-cultural platform comparisons are difficult and thus rare, but valuable to understand the varied motivations and attitudes that emerge from different environments, histories, laws, and traditions. Most such studies take the form of user surveys, interviews, or data donations, but a user-side focus is necessarily limited in their scope and sample size. The alternative is to take a macro perspective, creating representative samples of platform content across languages or geographies. Such a task is methodologically complex, and not typically possible by using official platform data channels. In 2023 we came up with a method to randomly sample YouTube (McGrady et al. “Dialing”), enabling for the first time a high-level metadata comparison of multiple languages: English, Spanish, Hindi, and Russian. The major finding was a stark difference between Hindi YouTube and the other three language communities. Amongst other differences was a curious pattern of engagement: low-view Hindi videos are much more likely than the other three languages to have likes. After watching hundreds of random videos in multiple languages, we hypothesised from this finding that people who upload Hindi YouTube videos are much more likely to have a small or private intended audience (McGrady et al. “One”). We came across expected YouTube genres – lip syncing, dancing, video game streams – but also found many home videos, religious ceremonies, weddings, and messages to loved ones. We are now engaged in a follow-up qualitative study comparing annotated videos in Hindi and English, and preliminary findings point to the same conclusion: accidental vlogs are much more common in Hindi. The 2014 general election in India was unique because of its widespread use of social media by the ruling party as well as the opposition. Following his successful bid for Prime Minister, Narendra Modi introduced the Digital India Campaign, facilitating ambitious public-private partnerships to promote rapid digital transformation. A wave of cheap, accessible smartphones quickly became the primary computing device for most Indians (Agrawal) and, as telecommunications providers like Reliance Jio expanded data infrastructure and offered heavily subsidised mobile data plans (Mukherjee), millions of people became first-time social media users. While adoption in the US and Europe progressed relatively slowly through a text-first Internet before adopting social video, large numbers of people in India leapt to video. For much of the history of the Internet, there was not much content for speakers of Indic languages to consume, and access was predominantly limited to urban centres. Those who did want to communicate in their native language encountered poor linguistic support on US/European-focussed platforms and devices. Video bypasses linguistic support, and TikTok became a popular platform for socialisation and entertainment across India, including rural communities and groups with lower literacy rates, until its 2020 ban created a vacuum for YouTube Shorts. We see a sharp rise in Hindi language content on YouTube starting in 2020 (McGrady et al. “One”). YouTube was designed for US/European cultures, with a one-size-fits all approach to privacy settings except where legally required (Trepte et al.), and researchers have largely focussed on the same group. YouTube’s localisation efforts focus on language and marketing rather than cultural or infrastructural adaptations (Mohan & Punathambekar). But Indian users are more apt to use social media for social interaction and to strengthen close ties. At the same time, its user population includes many new Internet users who are less conscious of online privacy (Arora), and platforms have not made enough of an effort to adapt privacy expectations and assumptions to these different contexts. The obscurity of Indic languages for a large part of the Internet’s history may have contributed to a perception among Indian users that platforms favour English and Western content, and thus setting content to public does not mean something will be seen. The Indian Internet boom prompted the government to create digital literacy initiatives. But the modules are limited to equipping users with functional and professional skills (Patankar), operating under the questionable assumption that peoples who newly gain Internet access are primarily interested in work and education, when the reality is far more complex and involves a range of entertainment-seeking and socialisation behaviours (Arora). Thus, the narrow scope of purely utilitarian literacy models misses socially driven patterns of use, leaving users vulnerable, especially when platforms are predominantly designed with Western users in mind (Glück). In other words, accidental vlogs may be a consequence of new video-first Internet users without access to relevant media literacy resources, interacting with a platform designed for text-first Western cultures. Research Ethics Internet privacy debates range from whether it is "dead" (Kelley) to how much privacy people deserve and who is responsible for ensuring it. We are not well positioned to suggest legislative remedies or platform features. When we address the shortcomings of platform architecture and media literacy education, we do so not to propose a direct remedy but to highlight issues that every YouTube researcher should be familiar with. Most importantly: accidental vlogs are not fleeting exceptions but common phenomena. When we study YouTube in particular, but also other video platforms, we have a responsibility to our subjects and the "fluid and fragile" nature of online self-disclosure (Kennedy 410), whether intentional or accidental. Social media researchers are not part of anyone's imagined audience (Lenhart & Shilton), least of all of people simply using YouTube for personal communication or passive storage. The very act of studying such a video brings contextual integrity into question, and we must take care not to violate it in ways that could negatively affect uploaders. These are not often issues that Institutional Review Boards take seriously when dealing with public content, and best practices often do not account for the informational leakiness of accidental vlogs. This is not a call to avoid working with accidental vlogs, but to do so with care: to avoid publicising them, to abstract data about them to ensure they cannot be linked – intentionally or incidentally – to the uploader’s personal information, and to share them with other researchers only upon receiving assurances that they will follow similar best practices. Conclusion YouTube is not just a platform used to “broadcast yourself”, but also fundamental communications infrastructure used to socialise with friends, family, and small groups, whose contextual integrity risks being violated not just by platform logics, but data analytics, AI companies, and even researchers. Platforms do have a responsibility to better localise privacy, but as Solove argues, we cannot count on for-profit companies to respect users’ interests, nor can we expect users – especially those thrust directly into the surveillance capitalism stage of the Internet – to navigate the complex and often user-hostile privacy apparatus. Governments can support rapid diffusion with practical training. They can also legislate privacy, like the European Union has done with the General Data Protection Regulation, or improve platform transparency through frameworks like the Digital Services Act / Digital Markets Act. Researchers, who are never part of an uploader’s imagined audience and stand to gain data access through such legislation, should take special care when working with accidental vlogs, which are easily susceptible to contextual violations and the leaking of incidental data. Our research with representative samples of YouTube videos reveals that accidental vlogs – public videos likely meant for a small or private audience which reveal aspects of the subject’s life – are surprisingly common, especially among Hindi-speaking users. We have begun to quantify the phenomenon and to investigate its reasons, and our early findings already point to a need for culturally informed approaches to privacy. Accidental vlogs and their cultural variability evince a failure to properly understand the diversity of needs, motivations, and norms applicable to a platform like YouTube, whose policies and operations are informed by Western-centered expectations. References Arora, Payal. The Next Billion Users: Digital Life beyond the West. Harvard UP, 2019. Agrawal, Ravi. India Connected: How the Smartphone Is Transforming the World’s Largest Democracy. Oxford UP, 2020. Barnes, Susan B. “A Privacy Paradox: Social Networking in the United States.” First Monday 11.9 (2006). . boyd, danah. Taken out of Context: American Teen Sociality in Networked Publics. PhD thesis. University of California Berkeley, 2008. “Brazil: Children’s Personal Photos Misused to Power AI Tools.” Human Rights Watch, 10 June 2024. . Glück, Antje. “De-Westernization and Decolonization in Media Studies.” Oxford Research Encyclopedia of Communication. 2018. . Goffman, Erving. The Presentation of Self in Everyday Life. Edinburgh: University of Edinburgh Social Sciences Research Centre, 1956. Kelley, Jason. “Privacy Isn’t Dead. Far from It.” Electronic Frontier Foundation, 21 Mar. 2024. . Kennedy, Ümit. “The Vulnerability of Contemporary Digital Autobiography.” A/b: Auto/Biography Studies 32.2 (2017): 409–11. . Kutschera, Stefan, et al. “Incidental Data: A Survey towards Awareness on Privacy-Compromising Data Incidentally Shared on Social Media.” Journal of Cybersecurity and Privacy 4.1 (2024): 105–25. . Lamdan, Sarah. Data Cartels: The Companies That Control and Monopolize Our Information. Stanford UP, 2022. Lange, Patricia G. “Publicly Private and Privately Public: Social Networking on YouTube.” Journal of Computer-Mediated Communication 13.1 (2007): 361–80. . Lenhart, Anna, and Katie Shilton. “‘I Feel like All of This Is Already Happening Anyways’: Context Import and Young Adults’ Perspectives on Researcher Access to TikTok Data.” Social Science Research Network (2025). . Litt, Eden. “Knock, Knock. Who’s There? The Imagined Audience.” Journal of Broadcasting & Electronic Media 56.3 (2012): 330–345. . Marwick, Alice E., and danah boyd. “I Tweet Honestly, I Tweet Passionately: Twitter Users, Context Collapse, and the Imagined Audience.” New Media & Society 13.1 (2010): 114–133. . Matassi, Mora, and Pablo J. Boczkowski. To Know Is to Compare: Studying Social Media across Nations, Media, and Platforms. MIT P, 2023. McGrady, Ryan, and Ethan Zuckerman. “AI Companies Train Language Models on YouTube’s Archive – Making Family-and-Friends Videos a Privacy Risk.” The Conversation, 27 June 2024. . McGrady, Ryan, et al. “Dialing for Videos: A Random Sample of YouTube.” Journal of Quantitative Description: Digital Media 3 (2023). . McGrady, Ryan, et al. “One Platform, Four Languages: Comparing English, Spanish, Hindi, and Russian YouTube.” Social Media + Society 11.3 (2025). . Metz, Cade, et al. “How Tech Giants Cut Corners to Harvest Data for A.I.” The New York Times, 6 Apr. 2024. . Mireshghallah, Niloofar, et al. “Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory.” arXiv.Org, 28 June 2024. . Mitchel, Katherine Kozlowski. “Case Study: Information Privacy & YouTube.” World Libraries 28.1 (2024). Mohan, Sriram, and Aswin Punathambekar. “Localizing YouTube: Language, Cultural Regions, and Digital Platforms.” International Journal of Cultural Studies 22.3 (2019): 317–333. . Mukherjee, Rahul. “Jio Sparks Disruption 2.0: Infrastructural Imaginaries and Platform Ecosystems in ‘Digital India’.” Media, Culture & Society 41.2 (2018). . Nissenbaum, Helen. Privacy in Context. Stanford UP, 2010. ———. “Contextual Integrity Up and Down the Data Food Chain.” Theoretical Inquiries in Law 20.1 (2019): 221–56. . Norberg, Patricia A., et al. “The Privacy Paradox: Personal Information Disclosure Intentions versus Behaviors.” Journal of Consumer Affairs 41.1 (2007): 100–26. . Patankar, Rishikesh, et al. “Achieving Universal Digital Literacy for Rural India.” Proceedings of the 10th International Conference on Theory and Practice of Electronic Governance (2017): 528–29. . Solove, Daniel J. On Privacy and Technology. Oxford UP, 2025. Sun, Jingle, et al. “The Imagined Audience in Social Media: A Systematic Review.” Studies in Media and Communication 13.1 (2024): 301. . Trepte, Sabine, et al. “A Cross-Cultural Perspective on the Privacy Calculus.” Social Media + Society 3.1 (2017). . Wu, Yanlai, et al. “Do Streamers Care about Bystanders’ Privacy? An Examination of Live Streamers’ Considerations and Strategies for Bystanders’ Privacy Management.” Proceedings of the ACM on Human-Computer Interaction 7: CSCW1 (2023): 1–29. . Zheng, Kevin, et al. TubeStats, 17 June 2024. .
Building similarity graph...
Analyzing shared references across papers
Loading...
Ryan McGrady
Harshita Snehi
M/C Journal
Building similarity graph...
Analyzing shared references across papers
Loading...
McGrady et al. (Sun,) studied this question.
www.synapsesocial.com/papers/68ff87e9c8c50a61f2bdd1f9 — DOI: https://doi.org/10.5204/mcj.3201