Skip to main content

Example Movie Dataset

The Example Movies Dataset is used throughout the documentation. It is a filtered and restructured dataset derived from the IMDB dataset. It consists of movies, actors and a character mapping between movies and actors (i.e. who played which character in which movie).

Download

The files are available in the following formats:

FormatDescription
CSVThe quote character is " and the delimiter is ,.
JSONLEach line is a JSON object. This is sometimes referred to Newline Delimited JSON (ndjson).

Download the files here:

FileDescriptionCSVJSONL
MoviesThe list of movies.movies.csvmovies.jsonl
ActorsThe list of actors.actors.csvactors.jsonl
Character MappingThe mapping between movies and actors.movie_to_actor_character.csvmovie_to_actor_character.jsonl

File Structure

Movies

FieldDescription
movie_idThe unique identifier of the movie.
titleThe title of the movie.
yearThe year the movie was released.
runtimeThe runtime of the movie in minutes. \N will appear if the runtime is unknown.
genresThe genres of the movie.
actor_idsThe unique identifiers of the actors in the movie. In CSV, this is a comma-separated list. In JSONL, it is an array.

CSV Example

movie_id,title,year,runtime,genres,actor_ids
tt10396038,Neither Hero Nor Traitor,2020,73,"Drama,History","nm6286339,nm0261957,nm8358700,nm7506705"
tt10397306,Madison,2020,87,"Adventure,Drama,Family","nm10055076,nm0525518,nm0913217,nm12691322"
tt10397752,Disordered,2020,89,"Animation,Documentary,Family",nm11496979

JSONL Example

{"movie_id":"tt14063912","title":"Light of a Burning Moth","year":"2020","runtime":"120","genres":"Drama","actor_ids":["nm10059705","nm12324893","nm4259675"]}
{"movie_id":"tt14085636","title":"Flight Paths","year":"2020","runtime":"60","genres":"Animation","actor_ids":["nm12334908"]}
{"movie_id":"tt14097488","title":"Wolf Cubs of Apple Valley","year":"2020","runtime":"\\N","genres":"Drama,Family","actor_ids":["nm12344653","nm11990620","nm10209159","nm12344730"]}

Actors

FieldDescription
actor_idThe unique identifier of the actor.
nameThe name of the actor.
birthYearThe year the actor was born. \N will appear if the birthYear is unknown.
deathYearThe year the actor died. \N will appear if the deathYear is unknown.
primaryProfessionThe primary professions of the actor. \N will appear if the primaryProfession is unknown. In CSV, this is a comma-separated list. In JSONL, it is an array.

CSV Example

actor_id,name,birthYear,deathYear,primaryProfession
nm0135292,Carmine Capobianco,1958,2021,"actor,writer,producer"
nm0135427,Joe Capozzi,\N,\N,"actor,producer,writer"
nm0135439,Al Capp,1909,1979,"writer,actor,miscellaneous"
nm0135586,Fritjof Capra,1939,\N,writer

JSONL Example

{"actor_id":"nm0135292","name":"Carmine Capobianco","birthYear":"1958","deathYear":"2021","primaryProfession":["actor","writer","producer"]}
{"actor_id":"nm0135427","name":"Joe Capozzi","birthYear":"\\N","deathYear":"\\N","primaryProfession":["actor","producer","writer"]}
{"actor_id":"nm0135439","name":"Al Capp","birthYear":"1909","deathYear":"1979","primaryProfession":["writer","actor","miscellaneous"]}

Character Mapping

FieldDescription
movie_idThe unique identifier of the movie.
actor_idThe unique identifier of the actor.
charactersThe characters the actor played in the movie. In CSV, this is a pipe-separated list. In JSONL, it is an array.

CSV Example

movie_id,actor_id,characters
tt11487958,nm7945000,Sara
tt11487958,nm8429647,James
tt11487958,nm5466314,Restaurant Patron|Partygoer

JSONL Example

{"movie_id":"tt11487958","actor_id":"nm7945000","characters":["Sara"]}
{"movie_id":"tt11487958","actor_id":"nm8429647","characters":["James"]}
{"movie_id":"tt11487958","actor_id":"nm5466314","characters":["Restaurant Patron","Partygoer"]}