I Wanted to decide on whether to go and watch Gippy or Aurangazeb movie. Both the Bollywood movies have been released recently. For fun, I wanted to choose this based on the Twitter trends and I wanted to kind of decide this based on the Verbs “Watch” and/or “Book” appearing on the Tweets , which is a implicit way people could describe or recommend an action over their tweets
In order to decide this, I did the following
. 1. Mine Twitter with hash tags for #Gippi and #AURANGAZEB (about 1500 tweets)
2. Each Tweet is parsed for its linguistics (NLP) and the Verbs are extracted(e.g Watch, Book)
* Here is a sample tweet[image] from Twitter
3. The tweets are then vectorised for a Supervised Machine learning training . For this training, the feature vectors will be the ‘Verbs’ along with the tweet text and the labels that I applied (generated) was ‘GO-GIPPI’ and ‘GO-AURANGAZEB’
4. A Machine learning algorithm based on K-Nearest Neighbour with Manhattan distance was trained on this data. [I also tried with Euclidean distance]
5. Once trained, I applied the test label ‘Watch and then ‘Book’ on the trained model and asked the Algorithm to predict its results (classification) as ‘GO-GIPPI’ and ‘GO-AURANGAZEB’ for the given label (which is actually an expected Verb in the tweet implying the recommendation)
6. To my surprise, for both the test labels (the Verbs) Watch and Book which actually means where people are writing Watch ‘X’ movie or Book ‘X’ movie , the Algorithm classified and recommended ’GO-AURANGAZEB’ as the result [see the results box of my program]
So, The twitter based recommendation algorithm pointed that I should go and Watch the AURANGAZEB movie!!
I quickly wanted to see if there is a quantification to this recommendation by my Machine learning ,predictive analytics program
To my Surprise, From this website , the Box office collections of Aurangazeb movie indicated that it is way higher than Gippi’s nox office collections which is perhaps a direct reflection that the film is doing well and more people go and watch it!! Isnt it?
As you can see from the below picture [excerpt of website screenshot], Aurangazeb had grossed Rs 14.7 Cr as against Gippy’s Rs 4.5 Cr.
Next I sampled a public recommendation from the yahoo answer website that again pointed to “AURANGAZEB” as the best movie!
So from these public information , I validated that my program predicted from the twitter /tweets and recommended to go to “AURANGAZEB” which I am planning to see soon to really check !?
As I look for mining large volumes of twitter data and apply it to machine learning algorithms to make complex predictions, I am going to need more storage,RAM and processing power and Cloud will be the right place to make this happen. Obviously Amazon AWS is my choice!














