22

Oct

By Benjamin | No Comments

Who chat more, me of my GF? Message analysis in R
This post is dedicated to someone very special, my GF. Hope you like it ;).
Methods
Made a dump of six months (March to August, 2015) from our WhatsApp® conversation. Found trends about who chat more (number of messages), who takes more...

18

Oct

By Benjamin | No Comments

Spam comment analysis in R
Imagine login into your blog and find out more than hundred spam messages, not cool!. I am not letting the spammers win, so I decided to crack some patterns and try to understand/learn something about these little annoying bots.
For this post, I am performing spam...

30

Apr

Introduction to text mining in R
I was checking some Machine Learning challenges at Hackerrank and found a particular challenge which consist on document classification. The source is over here. I downloaded the dataset and decided to make my own text mining analysis instead. The dataset...

20

Apr

By Benjamin | No Comments

Introduction to K-means in R
Quoting Wikipedia:
"k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation...

25

Mar

By Benjamin | No Comments

Using Neural Networks to fit equations in R
Introduction
Quoting Wikipedia:
"In machine learning and cognitive science, artificial neural networks (ANNs) are a family of statistical learning algorithms inspired by biological neural networks (the central nervous systems of animals, in...

19

Mar

By Benjamin | No Comments

The tale of two algorithms, importance of algorithm analysis in our daily programming tasks
Introduction
Quoting Wikipedia: "In computer science, the analysis of algorithms is the determination of the amount of resources (such as time and storage) necessary to execute them. Most...

17

Mar

By Benjamin | No Comments

What can we say about world fertility, life expectancy and population size?
Introduction
The purpose of this post is not to "reinvent the wheel", but rather to be used for a playground with a somewhat difficult Data Science problem where data is highly correlated or not correlated at all...

15

Mar

By Benjamin | No Comments

Handling large FASTA sequence datasets in R: Shuffle and retrieve "n" number of sequences of fixed length from the whole FASTA file and export them in a new FASTA file
Introduction
When you are working with large FASTA datasets is likely to find out that the sequences are in sort of a mixed...

14

Mar

By Benjamin | No Comments

Extracting upstream regions of a RefSeq human gene list in R using Bioconductor
Introduction
Suppose that you want to do local mapping of upstream regions of a given RefSeq IDs in a particular genome in R using Bioconductor. Download the script here.
In this case, you may take a look at...

14

Mar

By Benjamin | No Comments

Upgrade and update R 2.X to R 3.X in Debian Wheezy 7.X
Introduction
Following the instructions from CRAN, you need to add the R backports in your source list.
FIRST PART: ADD R BACKPORTS:
First, open a Terminal and open the sources.list file:
$ gksudo gedit /etc/apt/sources.list
Then,...