Data Scientist - Benjamin Tovar

Extracting upstream regions of a RefSeq human gene list in R using Bioconductor

14

Mar

 

Extracting upstream regions of a RefSeq human gene list in R using Bioconductor

Introduction

Suppose that you want to do local mapping of upstream regions of a given RefSeq IDs in a particular genome in R using Bioconductor. Download the script here.

In this case, you may take a look at the Bioconductor AnnotationData Packages here

The goal of this post is that for example I have the following RefSeq IDs and want to extract 250 bases upstream of each gene in a single list with another useful information such the entrez.id, the symbol and the gene description.


# RefSeqs IDs:
gene.list.refseq
# How many bases upstream of each gene:
bases.upstream

Before starting, please download the following packages in R:


source("http://bioconductor.org/biocLite.R")
biocLite("BSgenome.Hsapiens.UCSC.hg19")
biocLite("Biostrings")
biocLite("org.Hs.eg.db")

Load the function using:
source("extract.five.utr.sequence.R")

And finally make the computations:

output.sequences = extract.five.utr.sequence(gene.list.refseq,bases.upstream)

twittergoogle_plusredditlinkedin

Tags: ,


Leave a comment
 

Your email address will not be published. Required fields are marked. *