Data Scientist - Benjamin Tovar

Who chat more, me of my GF? Message analysis in R

22

Oct

 

Who chat more, me of my GF? Message analysis in R

This post is dedicated to someone very special, my GF. Hope you like it ;).

Methods

Made a dump of six months (March to August, 2015) from our WhatsApp® conversation. Found trends about who chat more (number of messages), who takes more time to answer messages by month, day and weekday and finally plot wordclouds to show frequent words per author.

Results

From 36,129 messages analysed, 19,550 were mine and 16,579 were from Anne, So I won this round (Ben:1, Anne:0).

n_messages_per_author

Who takes longer to reply messages?. OK, definitely I took longer with an average in minutes of 8.0 and Anne with an average of 5.4. Anne won this round. (Ben:1, Anne:1).

diff_mins_messages_per_author

Comparing number of messages and average minutes of difference between messages given the author and day of the month.

Looking at the top barplot,  it took me an average of 25 minutes to reply messages the 1st of each month analysed,  also I regularly tend to take longer to reply until the 3rd day, the pattern appears again in the last days of the month (30th to 31st). I am usually more busy these days (paying bills, rent, not cool bro, not cool). Anne on the contrast, the Delta values between days is more close to 0 than my Delta (this means that she’s more constant in the number of messages sent). Anne won this round. (Ben:1, Anne:2).

Exploring the bottom barplot, it looks like we send a similar number of messages per day, this is, taking for example day 2nd and 21st, the number of messages are different between these days, but within the same day, the number of messages among us is very similar (like a linear correlation). So, we are proportionally replying our messages. Draw round. (Ben:2, Anne:3).

d_messages_minutes_day_author

Same analysis, but now comparing by month. Yep, I always took more time to reply messages but in my defence, I always send more of them. As these trends were already discussed, no extra points.

d_messages_minutes_month_author

Same analysis, but now comparing by weekday. Conclusion is the same as above.

d_messages_minutes_weekday_author

Additionally, Sundays usually are the days with more messages, but the difference is not very noticeable from the other weekdays.

n_messages_per_weekday

Wordcloud: Top 2000 distinctive words per author

Despite the odds, yep, I say “amor” more frequently than you! (besides, your more distinctive words are “jajaja” >.< ), so, I won this round. (Ben:3, Anne:3).

comparison_cloud_top_2000_words

Wordcloud: Top 2000 common words (regardless author)

No points for anyone. Our most common shared word is “que” (what in English).

commonality_cloud_top_2000_words

Conclusion

Scoring board show 3 points for me and 3 points for Anne, so it is a draw ;). Hope you like the post.

Ben

twittergoogle_plusredditlinkedin

Tags:


Leave a comment
 

Your email address will not be published. Required fields are marked. *