Mining Twitter Sentiments within an Hour of Earth Quake in Norther California Today

An Earth Quake with a magnitude of about 5.7 in the Richter scale had just struck Northern California today. I quickly ran one of the Twitter mining and sentiment analysis program that I had developed to get various metrics of the sentiment.

This test drive was from a fairly small representative samples which is about 1500 tweets

Here is a video of my program run

Just to interpret the results

Tweets Sentiment Vibe!! [Score between -1.0 and 1.0 range] is–: 2%.

At 2% , This indicates that the sentiments are low , being a grave incident

Tweet’s Objective Perceptions –: 5% . This indicates that the perceptions are more of subjective than objective which is ok since many tweeters are not expected to be on ground zero and its still mid night in the US

Tweet’s Degree of Certainity  :- 81% , This reflects that the nature of tweets indeed reflects   the seriousness and certainty of the content related to the topic

The Tweets Positivity is –: 4%         –     This reflects lower level of a positivity , there is something to suspect or something seems to be obviously wrong or closer to negativity

The below metric reflects the mood of the tweets , which directly reflects a strong belief which is significantly higher at 1357 meaning that “It is indeed happening and a fact” as against a probable or imaginary belief
(‘The *Belief Mood* is :-‘, 1357)
(‘The *Probable or Imaginary Mood* is :-‘, 7)

These are sample metrics with representative results  with just 1500 tweets with certain parameter thresholds. However, My belief is the tweet metrics and correlation perhaps does reflect the state of moods,perception / sentiments on the searched text ‘Northern California’ affected by an earth quake (which was trending high on twitter)

This software was not run on the cloud, but obviously when I intend to do an sentiment analysis on a larger scale from the twitter fire hose, I plan to use AWS Dynamo DB and Elastic search integrated to my core sentiment analyser software.

Posted in Cloud Computing | Tagged , , , | 1 Comment

Machine Learning & Twitter Recommendation : Should I Go & Watch Gippy or Aurangazeb Movie??

I Wanted to decide on whether to go and watch Gippy or Aurangazeb movie. Both the Bollywood movies have been released recently. For fun, I wanted to choose this based on the Twitter trends and I wanted to kind of decide this based on the Verbs “Watch” and/or “Book” appearing on the Tweets , which is a implicit way people could describe or recommend  an action over their tweets

In order to decide this, I did the following

. 1. Mine Twitter with hash tags for #Gippi and #AURANGAZEB (about 1500 tweets)

2. Each Tweet is parsed for its linguistics (NLP) and the Verbs are extracted(e.g Watch, Book)

* Here is a sample tweet[image] from Twitter


3.  The tweets are then vectorised for a Supervised Machine learning training . For this training, the feature vectors will be the ‘Verbs’ along with the tweet text and the labels that I applied (generated) was ‘GO-GIPPI’ and ‘GO-AURANGAZEB’

4. A Machine learning algorithm based on K-Nearest Neighbour with Manhattan distance was trained on this data. [I also tried with Euclidean distance]

5. Once trained, I applied the test label ‘Watch and then ‘Book’ on the trained model and asked the Algorithm to predict its results (classification) as ‘GO-GIPPI’ and ‘GO-AURANGAZEB’ for the given label (which is actually an expected Verb in the tweet implying the recommendation)

6. To my surprise, for both the test labels (the Verbs) Watch and Book which actually means where people are writing Watch ‘X’ movie or Book ‘X’ movie , the Algorithm classified and recommended ‘GO-AURANGAZEB’ as the result [see the results box of my program]


So, The twitter based recommendation algorithm pointed that I should go and Watch the AURANGAZEB movie!!

I quickly wanted to see if there is a quantification to this recommendation by my Machine learning ,predictive analytics program

To my Surprise, From this website , the Box office collections of Aurangazeb movie indicated that it is way higher than Gippi’s nox office collections which is perhaps a direct reflection that the film is doing well and more people go and watch it!! Isnt it?

As you can see from the below picture [excerpt of website screenshot], Aurangazeb had grossed Rs 14.7 Cr as against Gippy’s Rs 4.5 Cr.


Next I sampled a public recommendation from the yahoo answer website that again pointed to “AURANGAZEB” as the best movie!


So from these public information , I validated that my program predicted from the twitter /tweets and  recommended to go to “AURANGAZEB” which I am planning to see soon to really check !?

As I look for mining large volumes of twitter data and apply it to  machine learning algorithms to make complex predictions, I am going to need more storage,RAM and processing power and Cloud will be the right place to make this happen. Obviously Amazon AWS is my choice!

Posted in Uncategorized | Tagged , , , , , , , | Leave a comment

Amazon AWS’s role in my Data Science pursuits

In a matter if two months, I climbed to the top 1% of Kaggle by solving some very interesting problems provided by leading organisations and through applying various Machine learning techniques to Complex data. If you haven’t known about Kaggle, it is a global platform that connects Machine Learning Scientists and Engineers with Organisations that wants to solve their data science problems in the form of  competitions.

While I do have some exposure in the AI related areas several years ago, I am neither a real Data Scientists holding a Phd or a Post-doc researcher or an Industry Veteran working in the field of Analytics except the fact I have been learning and working on some of the connected areas offlate. When I started at Kaggle initially, I quickly realised that Solving complex machine learning problems in its true sense is not for the weak hearted!  and I am one of those in the process of getting stronger over every weekend hacks these days and its been an exciting intellectually rewarding journey!

Several times over these weekend pursuits, I had to run algorithms on machines that required very high capacity and I had to do it the lowest cost. Amazon AWS so far has helped me address both these problems with its high memory XL and spot instances combined with the ability to quickly launch different sets of pre-baked machine learning run times through AWS machine images and Cloud-formation deployment.

In essence AWS is significantly helping me to leap forward in my data science pursuits.


Posted in Uncategorized | Tagged , , | Leave a comment

Cloud Based IP Video Surveillance – A demo

There is a slow and steady proliferation of IP based Video surveillance around the World. Particularly in the US and Europe where advanced wired and wireless broadband IP networks have been rapidly gaining a stronghold . My belief is that the confluence of this  IP network proliferation combined with Megapixel High resolution HD Cameras integrated to On-demand Cloud services provides a  significant opportunity to create niche IP Video surveillance solutions on the Cloud.

I believe , Cloud combined with video software technologies has the potential to  become the largest and ‘defacto’ distribution hub for IP Video surveillance data in the days to come. Cloud based storage and content distribution services will bring a paradigm shift in this landscape.  The Elastic nature of the Cloud will bring petabyte scale DVRs recording and storing live IP surveillance Video, scalable cloud servers and CDN can broadcast hundreds and thousands of live streams to global consumers for instance….

As a demonstration of this concept, Here is a demo of a solution I created on Amazon AWS cloud.

The figure below provides high level overview of the Concept/Solution


Here is the Live Video of a Demo IP CAM Video in HD format transmitted and distributed from the Cloud in real time!

Posted in Cloud Computing | Tagged , , , , , | Leave a comment

Transcoding a 1080P HD Video on ‘Amazon Elastic Cloud Video transcoder’ in 30 seconds

Amazon web services today introduced an Elastic Video Transcoding service. I set out to quickly try a 1080P HD Video transcoding on it. Basically , I got a HD Video transcoded to  a 480 P ,SD Video in Mp4(H.264 codec) ,  and FLV (Flash video format) . Here is how i did it…..

1. I downloaded a publicly available 1080P HD Video [.mp4] ,~ 500 MB in size of  a space launch and uploaded it to one of my AWS S3 Buckets .


2. This bucket also served as the input bucket for the AWS Elastic transcoder


3.  I created a new Pipeline for AWS Elastic transcoding, specifying the name, input and output buckets


4. Next create a  transcoding job with the above pipeline and the 1080P HD Video file we had . Specified 480P ,SD Video as the video preset for transcoder output.  AWS provides various presets (i,e for Screen /device targets..)


5.  Start this job and the status of the transcoding job can be monitored. progressing now…transcoder5

6. Next, I check the status again and the transcoding is completed . I think it just took about 30 seconds for this entire pipeline job to be completed


7. Next,Checked if the transcoded video has been saved at the designated S3 Bucket. Yes, It has indeed been processed and saved. The output is a 15.8 MB , mp4 video file with 480P resolution. (As you can see here, I had also transcoded a similar job with the output transcoding as an Flash media file (FLV 480P))

I then made the 480P transcoded video as public (s3 ACL) for playing and copied the URL


8.  To play this transcoded file, I used the the great JW Player’s Online wizard accessible at     and configured its player mode to Flash player.[other mode being HTML5]

9. JW Player played the 480P transcoded MP4 and FLV videos of the spacecraft launch with a great outcome without a noticeable loss in quality.

MP4 Format play


FLV  format Play


I believe it is a great new service by AWS and the pricing at the outset seems to be cheaper than the some of the 3rd party vendors providing similar services.

I can only imagine, Amazon has set its sight on the Hollywood studios ! One of the slides I  used to see in some of the AWS conferences ,I had attended. I believe it is going to be forth coming. Studios could get thousands of their videos up in to S3 through AWS Import/Export…-:)  and get this done fast and cheap.

Recollecting Dr.Werner Vogels quote “If you have a great idea, the cloud (aws) will execute it for you!”

Looks like the cost of video post-production for studios and media companies is drastically going to fall if AWS Elastic transcoding services could be leveraged in a suitable manner.

Posted in Cloud Computing | Tagged , , , , | 4 Comments

OpenNebula Cloud Architecture Survey

OpenNebula released an interesting survey recently reflecting  the Cloud computing usage and adoption trends.

One the key response was that 58% of the respondents surveyed are running their workloads for  non-critical environment or peripheral installations for running testing or development applications, while 42% are using the cloud for running production workloads.  This kind of validates the expected trend about which I had blogged in the Yr 2010 !

It makes sense most of the time as non-mission critical or non-value add IT assets which doesn’t contribute to the ROI and/or which has lower risks are first class candidates to be evaluated for cloud migration.

Posted in Uncategorized | Tagged , , , , , , | Leave a comment