Marketing & Business Analytics Professional
Utilized LDA and Euclidean distance in Python to cluster 1,100+ films by topic and identify optimal 2014 release date for The Maze Runner, reducing direct-release competition by 23%.
When it comes to movie success, timing is everything. Studios risk millions if a film gets overshadowed by similar releases. My task was to determine the optimal release date for The Maze Runner by analyzing how thematically similar movies clustered throughout 2014.
Instead of relying on gut instinct or rule-of-thumb seasonality, I used topic modeling (LDA) and similarity metrics to quantify competition. This allowed me to pinpoint weeks with minimal thematic overlap, giving The Maze Runner the best shot at box office success.
The challenge required a sophisticated analytical approach that could:
I worked with a dataset of over 1,100 U.S. movie releases, each described by:
Each movie was thus represented as a 10-dimensional topic vector, capturing its thematic DNA.
To make sense of the model, I analyzed:
For example:
This helped me label and explain each cluster for both qualitative insight and strategic use.
To measure how "competitive" each movie was relative to The Maze Runner, I computed:
I filtered movies within ~2 standard deviations of The Maze Runner's vector — these were its thematic competitors.
I focused only on 2014 releases. For each week, I calculated:
These three scores were weighted to produce a combined similarity score per week — lower scores meant better strategic fit.
Movie | Euclidean Distance | Cosine Similarity |
---|---|---|
The Twilight Saga: New Moon | 0.042 | 0.997 |
Daybreakers | 0.056 | 0.997 |
28 Weeks Later | 0.063 | 0.995 |
The Conjuring | 0.069 | 0.993 |
The Hunger Games: Catching Fire | 0.111 | 0.985 |
These films share themes like dystopia, survival, suspense, and sci-fi action. While a few horror films (e.g., The Conjuring) appeared, their inclusion reflected overlapping storytelling elements like tension and post-apocalyptic settings.
Rank | Recommended Week | Combined Similarity Score |
---|---|---|
1st Choice | November 7, 2014 | 0.772 |
2nd Choice | May 9, 2014 | 0.618 |
3rd Choice | May 23, 2014 | 0.650 |
My top recommendation was November 7, due to:
I avoided June due to exam season overlaps, and April because of competition from Easter-timed animated releases.
To support my recommendation, I built a release calendar heatmap showing weekly competition (based on similarity scores) and overlaid major seasonal events (e.g., summer break, Thanksgiving, awards season). This visualization helped translate technical insights into a clear business story.
I also tested the LDA model with:
Conclusion: 10 topics offered the best balance of clarity and clustering performance.
By accurately identifying the optimal release window, this analysis potentially:
This project demonstrated how topic models can turn subjective marketing intuition into structured, defensible strategy. By translating themes into vectors and competition into distance metrics, I helped a hypothetical studio make a multimillion-dollar decision with analytical clarity.
This methodology could extend beyond film scheduling to other competitive landscape analyses in marketing, product launches, and content strategy across industries.