Tag : R

22

Oct

Who chat more, me of my GF? Message analysis in R

 
Who chat more, me of my GF? Message analysis in R This post is dedicated to someone very special, my GF. Hope you like it ;). Methods Made a dump of six months (March to August, 2015) from our WhatsApp® conversation. Found trends about who chat more (number of messages), who takes more...
18

Oct

Spam comment analysis in R

 
Spam comment analysis in R Imagine login into your blog and find out more than hundred spam messages, not cool!. I am not letting the spammers win, so I decided to crack some patterns and try to understand/learn something about these little annoying bots. For this post, I am performing spam...
30

Apr

Introduction to text mining in R

 
Introduction to text mining in R I was checking some Machine Learning challenges at Hackerrank and found a particular challenge which consist on document classification. The source is over here. I downloaded the dataset and decided to make my own text mining analysis instead. The dataset...
20

Apr

Introduction to K-means in R

 
Introduction to K-means in R Quoting Wikipedia: "k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation...
25

Mar

Using Neural Networks to fit equations in R

 
Using Neural Networks to fit equations in R Introduction Quoting Wikipedia: "In machine learning and cognitive science, artificial neural networks (ANNs) are a family of statistical learning algorithms inspired by biological neural networks (the central nervous systems of animals, in...
19

Mar

The tale of two algorithms, importance of algorithm analysis in our daily programming

 
The tale of two algorithms, importance of algorithm analysis in our daily programming tasks   Introduction Quoting Wikipedia: "In computer science, the analysis of algorithms is the determination of the amount of resources (such as time and storage) necessary to execute them. Most...
17

Mar

What can we say about world fertility, life expectancy and population size?

 
 What can we say about world fertility, life expectancy and population size? Introduction The purpose of this post is not to "reinvent the wheel", but rather to be used for a playground with a somewhat difficult Data Science problem where data is highly correlated or not correlated at all...
15

Mar

Handling large FASTA sequence datasets in R: Shuffle and retrieve “n” number of sequences of fixed length from the whole FASTA file and export them in a new FASTA file

 
Handling large FASTA sequence datasets in R: Shuffle and retrieve "n" number of sequences of fixed length from the whole FASTA file and export them in a new FASTA file Introduction When you are working with large FASTA datasets is likely to find out that the sequences are in sort of a mixed...
14

Mar

Extracting upstream regions of a RefSeq human gene list in R using Bioconductor

 
Extracting upstream regions of a RefSeq human gene list in R using Bioconductor Introduction Suppose that you want to do local mapping of upstream regions of a given RefSeq IDs in a particular genome in R using Bioconductor. Download the script here. In this case, you may take a look at...
14

Mar

Upgrade and update R 2.X to R 3.X in Debian Wheezy 7.X

 
Upgrade and update R 2.X to R 3.X in Debian Wheezy 7.X Introduction Following the instructions from CRAN, you need to add the R backports in your source list. FIRST PART: ADD R BACKPORTS: First, open a Terminal and open the sources.list file: $ gksudo gedit /etc/apt/sources.list Then,...