惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

酷 壳 – CoolShell
酷 壳 – CoolShell
T
Threatpost
Latest news
Latest news
N
News | PayPal Newsroom
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Help Net Security
Help Net Security
D
Darknet – Hacking Tools, Hacker News & Cyber Security
AI
AI
Simon Willison's Weblog
Simon Willison's Weblog
TaoSecurity Blog
TaoSecurity Blog
The Last Watchdog
The Last Watchdog
L
LINUX DO - 热门话题
Google DeepMind News
Google DeepMind News
T
Threat Research - Cisco Blogs
O
OpenAI News
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
The Exploit Database - CXSecurity.com
NISL@THU
NISL@THU
Application and Cybersecurity Blog
Application and Cybersecurity Blog
S
Securelist
小众软件
小众软件
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
Martin Fowler
Martin Fowler
S
SegmentFault 最新的问题
Cisco Talos Blog
Cisco Talos Blog
云风的 BLOG
云风的 BLOG
AWS News Blog
AWS News Blog
GbyAI
GbyAI
N
News and Events Feed by Topic
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
美团技术团队
Engineering at Meta
Engineering at Meta
A
About on SuperTechFans
博客园 - 三生石上(FineUI控件)
S
Schneier on Security
博客园 - 聂微东
V2EX - 技术
V2EX - 技术
T
Troy Hunt's Blog
SecWiki News
SecWiki News
S
Secure Thoughts
B
Blog RSS Feed
Hugging Face - Blog
Hugging Face - Blog
WordPress大学
WordPress大学
腾讯CDC
H
Heimdal Security Blog
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
Apple Machine Learning Research
Apple Machine Learning Research
月光博客
月光博客
www.infosecurity-magazine.com
www.infosecurity-magazine.com
P
Privacy International News Feed

all models are wrong

-> Going both ways in R <- Profiling in R More Segment HMMs Python and Numpy integers Nasty Python Things MICROCOSMOGRAPHIA ACADEMICA Pebl The Pirate Bay Trial Latex, Beamer, Python, Beauty
finding homologous probes using biomaRt
[1] Lee MS, Hanspers K, Barker CS, Korn AP et al. · 2010-05-15 · via all models are wrong

May 14, 2010

I asked a question on the superb biostar stackexchange site. It’s here: http://biostar.stackexchange.com/questions/1054/homology-bioconductor

It’s about finding geneome-wide homologies using bioconductor. It turns out that bioconductor has a package called biomaRt which allows you to query the Ensembl databases with ease. (Ensembl stores gene information for a bunch of different organisms).

I thought I’d write down my solution here, as a sort of extended answer to my question on biostar, in case anyone trips up on the question there and would like a more complete answer. You’ll need to read the question before any of this code makes sense!

library(biomaRt)
gen_hs2mm <- function(affyids){
    ensembl_hs <- useMart(
        "ensembl",
        dataset = "hsapiens_gene_ensembl"
    )
    hs2mm_filters <- c(
        "affy_hg_u133a",
        "with_mmusculus_homolog"
    )
    hs2mm_gene_atts <- c(
         "affy_hg_u133a",
        "ensembl_gene_id"
    )
    hs2mm_homo_atts <- c(
        "ensembl_gene_id",
        "mouse_ensembl_gene"
    )
    # the names in these lists are arbitrary
    hs2mm_value = list(
        affyid=affyids,
        with_homolog=TRUE
    )
    # get the human genes and mouse orthologues
    hs2mm_gene <- getBM(
        attributes = hs2mm_gene_atts,
        filters = hs2mm_filters,
        value = hs2mm_value,
        mart = ensembl_hs
    )
    hs2mm_homo <- getBM(
        attributes = hs2mm_homo_atts,
        filters = hs2mm_filters,
        value = hs2mm_value,
        mart = ensembl_hs
    )
    # merge the two lists!
    hs2mm <- merge(hs2mm_gene,hs2mm_homo)
}

gen_mm2hs <- function(affyids){
    ensembl_mm <- useMart("ensembl",
        dataset = "mmusculus_gene_ensembl")
        mm2hs_filters <- c(
        "affy_mogene_1_0_st_v1",
        "with_hsapiens_homolog"
    )
    mm2hs_gene_atts <- c(
        "affy_mogene_1_0_st_v1",
        "ensembl_gene_id"
    )
    mm2hs_homo_atts <- c(
        "ensembl_gene_id",
        "human_ensembl_gene"
    )
    # the names in these lists are arbitrary
    mm2hs_value = list(
        affyids=affyids,
        with_homolog=TRUE
    )
    # get the mouse genes and human orthologues
    mm2hs_gene <- getBM(
        attributes = mm2hs_gene_atts ,
        filters = mm2hs_filters,
        value = mm2hs_value,
        mart = ensembl_mm
    )
    mm2hs_homo <- getBM(
        attributes = mm2hs_homo_atts,
        filters = mm2hs_filters,
        value = mm2hs_value,
        mart = ensembl_mm
    )
    mm2hs <- merge(mm2hs_gene,mm2hs_homo)
}
source('load_data.r')
# here immgen and cd4T are different experession set objects 
# from Bioconductor.
# immgen is mouse data (from the Immunological Genome Project) 
# and cd4T is human data
# cd4T can be found on GEO using the accessionID GDS785 
# See ref[1]
immgen <- load_immgen()
cd4T <- load_GDS785()
hs2mm <- gen_hs2mm(rownames(exprs(cd4T)))
mm2hs <- gen_mm2hs(rownames(exprs(immgen)))
colnames(hs2mm)[1] <- 'human_ensembl_gene'
colnames(mm2hs)[1] <- 'mouse_ensembl_gene'
# the final thing is to merge the two tables to make a single 
# table containing all the probes that are homologous, along 
# with their respsective EnsemblIDs
homol <- merge(hs2mm,mm2hs)

Gene expression profiles during human CD4+ T cell differentiation. Int Immunol2004 Aug;16(8):1109-24. PMID: 15210650