What question did this study set out to answer?

The aim is to create a comprehensive, structured dataset addressing previous limitations in movie content analysis.

January 22, 2026Open Access

Scene-level movie data from Amazon X-Ray in the US market combined with IMDb

Key Points

The aim is to create a comprehensive, structured dataset addressing previous limitations in movie content analysis.
Compiled scene breakdowns for 3,265 movies from Amazon X-Ray.
Included subtitles for 3,110 movies to provide dialogue-level data.
Linked each title with its corresponding IMDb ID for additional metadata integration.
Enhanced consistency and accessibility of movie data for analyses.
Facilitated large-scale studies on character interactions and narrative structures.
Overcame limitations of earlier screenplay-based datasets.

Abstract

Abstract This paper presents a structured, scene-level dataset of movie content that addresses the limitations of previous research relying on small or non-standardized screenplay collections. Such collections often lack consistent scene representations and actor metadata and use draft versions that differ from their final cinematic products, limiting both the scale and accuracy for content-level analysis. To overcome these limitations, we compile scene breakdowns for 3,265 movies from Amazon X-Ray in the US Amazon Prime Video market, detailing the characters appearing in each scene and linking them to their corresponding IMDb IDs. Subtitles are included for the subset of 3,110 movies, providing complementary dialogue-level data, and each title is linked to its corresponding IMDb ID to enable augmentation with additional metadata for extended analyses. Integration of these resources can allow accurate, large-scale analyses of on-screen representation, character interactions, and narrative structure that were not feasible with earlier screenplay-based datasets. This dataset enhances the consistency and accessibility of movie data, providing a reliable stepping stone for quantitative research on film as cultural artifacts.

Scene-level movie data from Amazon X-Ray in the US market combined with IMDb

Key Points

Abstract

Cite This Study