Example Movie Dataset
The Example Movies Dataset is used throughout the documentation. It is a filtered and restructured dataset derived from the IMDB dataset. It consists of movies, actors and a character mapping between movies and actors (i.e. who played which character in which movie).
Download
The files are available in the following formats:
Format | Description |
---|---|
CSV | The quote character is " and the delimiter is , . |
JSONL | Each line is a JSON object. This is sometimes referred to Newline Delimited JSON (ndjson). |
Download the files here:
File | Description | CSV | JSONL |
---|---|---|---|
Movies | The list of movies. | movies.csv | movies.jsonl |
Actors | The list of actors. | actors.csv | actors.jsonl |
Character Mapping | The mapping between movies and actors. | movie_to_actor_character.csv | movie_to_actor_character.jsonl |
File Structure
Movies
Field | Description |
---|---|
movie_id | The unique identifier of the movie. |
title | The title of the movie. |
year | The year the movie was released. |
runtime | The runtime of the movie in minutes. \N will appear if the runtime is unknown. |
genres | The genres of the movie. |
actor_ids | The unique identifiers of the actors in the movie. In CSV, this is a comma-separated list. In JSONL, it is an array. |
CSV Example
movie_id,title,year,runtime,genres,actor_ids
tt10396038,Neither Hero Nor Traitor,2020,73,"Drama,History","nm6286339,nm0261957,nm8358700,nm7506705"
tt10397306,Madison,2020,87,"Adventure,Drama,Family","nm10055076,nm0525518,nm0913217,nm12691322"
tt10397752,Disordered,2020,89,"Animation,Documentary,Family",nm11496979
JSONL Example
{"movie_id":"tt14063912","title":"Light of a Burning Moth","year":"2020","runtime":"120","genres":"Drama","actor_ids":["nm10059705","nm12324893","nm4259675"]}
{"movie_id":"tt14085636","title":"Flight Paths","year":"2020","runtime":"60","genres":"Animation","actor_ids":["nm12334908"]}
{"movie_id":"tt14097488","title":"Wolf Cubs of Apple Valley","year":"2020","runtime":"\\N","genres":"Drama,Family","actor_ids":["nm12344653","nm11990620","nm10209159","nm12344730"]}
Actors
Field | Description |
---|---|
actor_id | The unique identifier of the actor. |
name | The name of the actor. |
birthYear | The year the actor was born. \N will appear if the birthYear is unknown. |
deathYear | The year the actor died. \N will appear if the deathYear is unknown. |
primaryProfession | The primary professions of the actor. \N will appear if the primaryProfession is unknown. In CSV, this is a comma-separated list. In JSONL, it is an array. |
CSV Example
actor_id,name,birthYear,deathYear,primaryProfession
nm0135292,Carmine Capobianco,1958,2021,"actor,writer,producer"
nm0135427,Joe Capozzi,\N,\N,"actor,producer,writer"
nm0135439,Al Capp,1909,1979,"writer,actor,miscellaneous"
nm0135586,Fritjof Capra,1939,\N,writer
JSONL Example
{"actor_id":"nm0135292","name":"Carmine Capobianco","birthYear":"1958","deathYear":"2021","primaryProfession":["actor","writer","producer"]}
{"actor_id":"nm0135427","name":"Joe Capozzi","birthYear":"\\N","deathYear":"\\N","primaryProfession":["actor","producer","writer"]}
{"actor_id":"nm0135439","name":"Al Capp","birthYear":"1909","deathYear":"1979","primaryProfession":["writer","actor","miscellaneous"]}
Character Mapping
Field | Description |
---|---|
movie_id | The unique identifier of the movie. |
actor_id | The unique identifier of the actor. |
characters | The characters the actor played in the movie. In CSV, this is a pipe-separated list. In JSONL, it is an array. |
CSV Example
movie_id,actor_id,characters
tt11487958,nm7945000,Sara
tt11487958,nm8429647,James
tt11487958,nm5466314,Restaurant Patron| Partygoer
JSONL Example
{"movie_id":"tt11487958","actor_id":"nm7945000","characters":["Sara"]}
{"movie_id":"tt11487958","actor_id":"nm8429647","characters":["James"]}
{"movie_id":"tt11487958","actor_id":"nm5466314","characters":["Restaurant Patron","Partygoer"]}