百度360必应搜狗淘宝本站头条
当前位置:网站首页 > 热门文章 > 正文

单细胞工具MARVEL—单细胞可变剪切分析(一)

bigegpt 2024-08-11 14:31 4 浏览

注:笔记内容参考官方教程用法摸索而成。因涉及步骤较繁琐,如有错误,欢迎指出!

R包MARVEL是由牛津大学MRC Weatherall分子医学研究所团队开发的,用于分析单细胞水平的可变剪切事件。相关文章于2023年1月在Nucleic Acids Research期刊发表,在其github页面分享了MARVEL工具的分析流程,在此学习、记录如下。


Pape:https://doi.org/10.1093/nar/gkac1260


Github:https://github.com/wenweixiong/MARVEL


Plate pipeline:https://wenweixiong.github.io/MARVEL_Plate.html


Droplet pipeline:https://wenweixiong.github.io/MARVEL_Droplet.html

可变剪切

单细胞数据类型

  • AS分析的核心是通过检测比对到splice junction位点的read(例如,跨越两个外显子)进行分析。
  • 目前主流的单细胞测序技术主要分为Droplet-based 10X方法与Plate-based Smart-seq方法。
  • 10X技术由于仅对cDNA的3'端部分进行测序,只能对该区域涉及的splice junction进行分析,且无法具体分析事件类型
  • Smart-seq技术虽然通量低,但是全长转录本的单细胞测序原理,可以进行更为深入的AS分析。

一、Plate-based pipeline

1、上游分析

1.1 工作环境

两个同级的目录(1)basic:参考数据以及代码脚本 (2)proj1:具体的分析项目文件

  • 后续分析均在 proj1/ 路径下操作
tree basic/ -L 2
# basic/
# ├── gtf
# │   ├── gencode.v31.annotation.gtf
# │   ├── GRCh38.primary_assembly.genome.fa
# │   ├── Log.out
# │   ├── rsem_index
# │   └── star_index
# ├── rscript_Anno_SJ_rMATs.R
# ├── rscript_bedtools_input.R
# └── single_sample_upstream.sh
tree proj1/ -L 2
# proj1/
# ├── SJ_phenoData.txt
# └── work
#     ├── fq
#     ├── intron
#     ├── merge
#     ├── rmats
#     ├── rsem
#     └── star

1.2 conda环境

# 名为marvel_plate的分析环境
conda create -n marvel_plate python=3.7
conda activate marvel_plate
conda install -c hcc aspera-cli
conda install -c bioconda star=2.7.1a rmats=4.1.0 -y 
conda install -c bioconda bedtools samtools rsem -y 
conda install -c r r-base r-essentials r-tidyverse -y

1.2 准备数据

(1)测序数据

  • 官方手册分析了如下数据集的 192个单细胞数据,以其中一个演示流程
# https://www.ebi.ac.uk/ena/browser/view/ERR1562083
id=ERR1562083
# id=$1
mkdir -p ./work/fq/${id}
ascp -QT -l 300m -P33001  \
-i ~/miniconda3/envs/marvel_plate/etc/asperaweb_id_dsa.openssh   \
era-fasp@fasp.sra.ebi.ac.uk:/vol1/fastq/${id:0:6}/00${id:0-1}/${id}/${id}_1.fastq.gz  ./work/fq/${id}

ascp -QT -l 300m -P33001  \
-i ~/miniconda3/envs/marvel_plate/etc/asperaweb_id_dsa.openssh   \
era-fasp@fasp.sra.ebi.ac.uk:/vol1/fastq/${id:0:6}/00${id:0-1}/${id}/${id}_2.fastq.gz  ./work/fq/${id}

(2)基因组数据

在 ./basic/gtf 目录下

https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_31/

## STAR索引
STAR  --runMode genomeGenerate \
--genomeDir star_index \
--genomeFastaFiles GRCh38.primary_assembly.genome.fa \
--sjdbGTFfile gencode.v31.annotation.gtf \
--runThreadN 40

## RSEM索引
rsem-prepare-reference \
--gtf gencode.v31.annotation.gtf \
--star  -p 20 \
GRCh38.primary_assembly.genome.fa \
rsem_index/hg38

(3)教程数据

  • 官方教程提供了各个分析步骤的示范文件,可供之后分析作为参考。
  • http://datashare.molbiol.ox.ac.uk/public/wwen/MARVEL/Tutorial/Plate/Data.zip
ls Data
# GTF  MARVEL  rMATS  R Object  RSEM  SJ

1.3 分析步骤

1.3.1 单样本4步分析

(1)单样本STAR比对

将每个样本的Fastq文件使用STAR软件进行比对,产生每个splice junction count matrix

# Reference
less -SN ./Data/SJ/SJ.txt

mkdir -p ./work/star/${id}
ref_idx_star=../basic/gtf/star_index
ref_gtf=../basic/gtf/gencode.v31.annotation.gtf
echo ${id} is runing......
#1st pass mode
STAR --runThreadN 16 \
--genomeDir ${ref_idx_star} \
--readFilesCommand zcat \
--readFilesIn ./work/fq/${id}/${id}_1.fastq.gz ./work/fq/${id}/${id}_2.fastq.gz \
--outFileNamePrefix ./work/star/${id}/${id}. \
--outSAMtype None
#2nd pass mode
STAR --runThreadN 16 \
--genomeDir ${ref_idx_star} \
--readFilesCommand zcat \
--readFilesIn ./work/fq/${id}/${id}_1.fastq.gz ./work/fq/${id}/${id}_2.fastq.gz \
--outFileNamePrefix ./work/star/${id}/${id}. \
--sjdbFileChrStartEnd ./work/star/${id}/*SJ.out.tab \
--sjdbGTFfile ${ref_gtf} \
--outSAMtype BAM SortedByCoordinate \
--outSAMattributes NH HI AS nM XS \
--quantMode TranscriptomeSAM

(2)单样本rMATs注释

根据上述比对BAM结果,注释出每个基因(转录本)涉及的可变剪切事件类型。使用rMATs软件,可注释出5种类型:SE、RI、MXE、A5SS、A3SS。

find ./Data/rMATS/ -name *featureData.txt
# ./Data/rMATS/SE/SE_featureData.txt
# ./Data/rMATS/A3SS/A3SS_featureData.txt
# ./Data/rMATS/MXE/MXE_featureData.txt
# ./Data/rMATS/RI/RI_featureData.txt
# ./Data/rMATS/A5SS/A5SS_featureData.txt
less -SN ./Data/rMATS/SE/SE_featureData.txt

mkdir -p ./work/rmats/${id}

echo ${id} is running......
echo "./work/star/${id}/${id}.Aligned.sortedByCoord.out.bam" > ./work/star/${id}/BAM_fls.txt

rmats.py \
--b1 ./work/star/${id}//BAM_fls.txt \
--gtf ${ref_gtf} \
--od ./work/rmats/${id} \
--tmp ./work/rmats/${id}/tmp \
-t paired \
--readLength 125 \
--variable-read-length \
--nthread 16 \
--statoff
# readLength参考如下
# zcat ./work/fq/${id}_1.fastq.gz | awk '{if(NR%4==2) print NR"\t"$0"\t"length($0)}' | less

(3)单样本Intron矩阵

统计Reads对于每个内含子的比对情况,逐碱基累加作为比对结果。

# reference
less -SN Data/MARVEL/PSI/RI/Counts_by_Region.txt

mkdir -p ./work/intron/${id}
echo ${id} is running......
samtools view -H ./work/star/${id}/${id}.Aligned.sortedByCoord.out.bam | \
  grep SQ | cut -f 2 | awk '{ sub(/^SN:/, ""); print;}' > ./work/star/${id}/sorted_chr_in_bam.txt
# 预定义R脚本,生成bedtools计算所需的两个文件:染色体大小以及内含子坐标
# https://github.com/lishensuo/utils/blob/main/marvel/rscript_bedtools_input.R
Rscript ../basic/rscript_bedtools_input.R ${id}

bedtools coverage \
-g ./work/intron/${id}/hg38.chrom.sizes.txt \
-split \
-sorted \
-a ./work/intron/${id}/RI_Coordinates_sorted.bed \
-b ./work/star/${id}/${id}.Aligned.sortedByCoord.out.bam > \
./work/intron/${id}/intron_count.txt \
-d

(4)单样本gene矩阵

最后需要对基因表达水平进行定量。教程推荐使用RSEM软件,并进行TPM类似的标准化处理

# reference
less -SN ./Data/RSEM/TPM.txt

mkdir -p ./work/rsem/${id}
rsem-calculate-expression \
--bam \
--paired-end \
-p 16 \
./work/star/${id}/${id}.Aligned.toTranscriptome.out.bam \
../basic/gtf/rsem_index/hg38 \
./work/rsem/${id}/${id}

(1)如上,由于4步的分析过程中产生的中间数据可能超过数百兆;且SMART-seq往往涉及成百上千的细胞。为节约空间,可在每个样本分析完成后,删除不必要的数据。

rm ./work/fq/${id}/${id}*
rm ./work/star/${id}/*bam
rm -rf ./work/rmats/${id}/tmp/*
rm ./work/rsem/${id}/${id}.transcript.bam
rm ./work/rsem/${id}/${id}.isoforms.results

(2)多样本批量后台运行方式

cat ../basic/single_sample_upstream.sh
chmod u+x ../basic/single_sample_upstream.sh
# 单样本测试
# ../basic/single_sample_upstream.sh ERR1562273

#
parallel多线程批量运行
nohup parallel -j 5 ../basic/single_sample_upstream.sh {} ::: \
$(grep -v "sample" ./SJ_phenoData.txt | awk '{print $1}') \
1> ./work/single_sample_upstream.log 2>&1 &

1.3.2 整合多样本结果

即分步整合上游4个步骤中的每个步骤的所有样本结果;均使用R语言完成

library(tidyverse)
library(parallel)
sample_meta = data.table::fread("./SJ_phenoData.txt",data.table=F)
samples=sample_meta$sample.id
  • step1: 汇总剪切位点矩阵
sj.list = mclapply(samples, function(sample){   
     sj_sp = data.table::fread(paste0("./work/star/",sample,"/",sample,".SJ.out.tab"), data.table=F)
     if(nrow(sj_sp)==0) return(NULL)
     sj_sp = sj_sp %>% 
       dplyr::mutate(`coord.intron`=paste0(V1,":",V2,":",V3)) %>% 
       dplyr::select(`coord.intron`, V7)
     colnames(sj_sp)[2] = sample
     return(sj_sp)
},mc.cores=10)
names(sj.list) = samples
sj.list = sj.list[!unlist(lapply(sj.list, is.null))]

sj_res = sj.list[[1]]
for(x in sj.list[-1]){
     sj_res = dplyr::full_join(sj_res, x)
}

for(sample in setdiff(samples,names(sj.list))){
     sj_res[,sample]=NA
}
dim(sj_res)

write.table(sj_res, file=paste0("./work/merge/SJ.txt"),
     quote=F, row.names=F, sep="\t")
  • step2:整理AS注释结果 这一步最为复杂,需要理解每一种剪切事件的定义以及正负链的差异参考官方示意:https://wenweixiong.github.io/Splicing_Nomenclature
# https://github.com/lishensuo/utils/blob/main/marvel/rscript_Anno_SJ_rMATs.R
source(paste0("../basic/rscript_Anno_SJ_rMATs.R"))

AS_types = c("SE", "RI", "MXE", "A3SS", "A5SS")

for(AS_type in AS_types){
     # AS_type = AS_types[1]
     print(AS_type)
     fls_AS = list.files(paste0("./work/rmats"), pattern=paste0("fromGTF.",AS_type,".txt"),
          recursive=T, full.name=T)

     if (AS_type=="SE"){
          func = se_func
     } else if (AS_type=="RI"){
          func = ri_func 
     } else if (AS_type=="MXE"){
          func = mxe_func 
     } else if (AS_type=="A3SS"){
          func = a3ss_func 
     } else if (AS_type=="A5SS"){
          func = a5ss_func
     }

     df_AS = mclapply(fls_AS, function(fls){
          rmats_AS = data.table::fread(fls, data.table=F)
          rmats_AS = rmats_AS %>%
               dplyr::mutate(tran_id = func(.)) %>% 
               dplyr::rename(gene_id=GeneID,gene_short_name=geneSymbol) %>% 
               dplyr::select(tran_id, gene_id, gene_short_name)
          return(rmats_AS)
     },mc.cores=10) %>% do.call(rbind, .) %>% 
          dplyr::distinct()
     write.table(df_AS, row.names=F, sep="\t",quote=F,
          file=paste0("./work/merge/as_",AS_type,"_featureData.txt"))
}
  • step3:汇总intron矩阵
int_mt.list = mclapply(samples, function(sample){
     # sample = samples[1]
     int_mt = data.table::fread(paste0("./work/intron/",sample,"/intron_count.txt")) 
     int_mt = int_mt %>% 
          dplyr::mutate(`coord.intron`=paste(V1,V2,V3, sep=":")) %>% 
          dplyr::group_by(coord.intron) %>% 
          dplyr::summarize(count=sum(V5)) %>% 
          dplyr::select(coord.intron, count) %>% as.data.frame()
     colnames(int_mt)[2] = sample
     return(int_mt)
},mc.cores=10)

intron_res = int_mt.list[[1]]
for(x in int_mt.list[-1]){
     intron_res = dplyr::full_join(intron_res, x)
}
dim(intron_res)

write.table(intron_res, file=paste0("./work/merge/intron_count_by_region.txt"),
     quote=F, row.names=F, sep="\t")
  • 汇总gene矩阵
gene_mt.list = mclapply(samples, function(sample){
     fls=paste0("./work/rsem/",sample,"/",sample,".genes.results")
     gene_mt = data.table::fread(fls, data.table=F) %>% 
          dplyr::select(gene_id, TPM)
     colnames(gene_mt)[2] = sample
     return(gene_mt)
},mc.cores=10)
gene_res = gene_mt.list[[1]]
for(x in gene_mt.list[-1]){
     gene_res = dplyr::full_join(gene_res, x)
}
dim(gene_res)
write.table(gene_res, file=paste0("./work/merge/rsem_tpm.txt"),,
     quote=F, row.names=F, sep="\t")

2、下游分析

后续根据教程提供的数据学习,需要安装的主要的MARVEL包以及其它辅助分析包,具体可参考教程。

# 建议再单独建立一个conda R环境
install.packages("MARVEL")

2.1 创建marvel对象

需要准备七类文件,读入为R语言对象

样本表型data.frame:第一列名为sample.id

剪切位点计数矩阵:第一列名为coord.intron

剪切事件类型注释列表:由数据框包装而出的list,每个数据库至少包含tran_id与gene_id

内含子计数矩阵:第一列名为coord.intron

基因表达矩阵:经标准化处理(RPKM/FPKM/TPM),暂时不要log转换

基因类型data.frame:至少包含3列,gene_id, gene_short_name, gene_type

上游分析所用到的GTF数据

library(tidyverse)
library(MARVEL)

# (1) 细胞表型分组
df.pheno = data.table::fread("./SJ_phenoData.txt",data.table=F)

# (2) Gene文件
df.tpm <- read.table("./work/merge/rsem_tpm.txt", sep="\t", header=TRUE)
dim(df.tpm)


# (3) GTF文件
gtf <- data.table::fread("../basic/gtf/gencode.v31.annotation.gtf", sep="\t", header=FALSE, na.strings="NA",quote="\"") %>% 
     as.data.frame()

# (4) Gene类型
gene.feature = subset(gtf, V3=="gene") 
df.tpm.feature <- str_split(gene.feature$V9, ";", simplify=T) %>% 
     as.data.frame() %>%
     dplyr::mutate(gene_id=str_match(V1, '"(.*)"')[,2]) %>% 
     dplyr::mutate(gene_short_name=str_match(V3, '"(.*)"')[,2]) %>%      
     dplyr::mutate(gene_type=str_match(V2, '"(.*)"')[,2]) %>% 
     dplyr::select(gene_id, gene_short_name, gene_type) %>% 
     dplyr::filter(gene_id %in% df.tpm$gene_id)
df.tpm.feature = df.tpm.feature[match(df.tpm$gene_id, df.tpm.feature$gene_id),]
rownames(df.tpm.feature) = seq(nrow(df.tpm.feature))

# (5) SJ文件
sj = data.table::fread("./work/merge/SJ.txt", sep="\t", header=TRUE, na.strings="NA") %>% 
     as.data.frame()
dim(sj)

# (6) AS文件
df.feature.list = lapply(c("SE", "MXE", "RI", "A5SS", "A3SS"), function(x){
     # x = "SE"
     df.feature = read.table(paste0("./work/merge/as_",x,"_featureData.txt"), 
          sep="\t", header=TRUE, na.strings="NA") %>% 
          dplyr::distinct(tran_id, .keep_all=TRUE) %>% 
          dplyr::left_join(df.tpm.feature)
     return(df.feature)
})
names(df.feature.list) <- c("SE", "MXE", "RI", "A5SS", "A3SS")

# (7) Intron文件
df.intron.counts <- data.table::fread("./work/merge/intron_count_by_region.txt", 
     sep="\t", header=TRUE, na.strings="NA") %>% 
     as.data.frame()
dim(df.intron.counts)
## MARVEL object
marvel <- CreateMarvelObject(SpliceJunction=sj,
                             SplicePheno=df.pheno,
                             SpliceFeature=df.feature.list,
                             IntronCounts=df.intron.counts,
                             GeneFeature=df.tpm.feature,
                             Exp=df.tpm,
                             GTF=gtf
                             )

2.2 预处理步骤

(1)基因表达log转换

marvel <- TransformExpValues(MarvelObject=marvel,
                             offset=1,
                             transformation="log2",
                             threshold.lower=1
                            )

(2)检测额外AS事件

# 如果基因表达水平整体较低,则不考虑该基因的可变剪切事件
# min.expr:判定单个细胞是否表达基因的阈值
# min.cells: 基于上述条件,判定细胞群表达该基因的阈值
marvel <- DetectEvents(MarvelObject=marvel,
                       min.cells=50, #细胞群的表达百分比
                       min.expr=1,   #认为细胞表达该基因的阈值
                       track.progress=FALSE,
                       EventType="AFE"
                       )
marvel <- DetectEvents(MarvelObject=marvel,
                       min.cells=50,
                       min.expr=1,
                       track.progress=FALSE,
                       EventType="ALE"
                       )
length(marvel$SpliceFeature)
# 7

(3)计算PSI值

计算一个基因对于特定剪切事件的PSI值,这是后续分析的基础

# CoverageThreshold: 支持该剪切事件的最小reads数,小于该阈值标记为NA
# UnevenCoverageMultiplier: 针对SE与MXE
marvel <- ComputePSI(MarvelObject=marvel,
                     CoverageThreshold=10,
                     UnevenCoverageMultiplier=10,
                     EventType="SE"
                     )

marvel <- ComputePSI(MarvelObject=marvel,
                     CoverageThreshold=10,
                     UnevenCoverageMultiplier=10,
                     EventType="MXE"
                     )

marvel <- ComputePSI(MarvelObject=marvel,
                     CoverageThreshold=10,
                     EventType="RI",
                     thread=4  # only support RI
                     )

marvel <- ComputePSI(MarvelObject=marvel,
                     CoverageThreshold=10,
                     EventType="A5SS"
                     )

marvel <- ComputePSI(MarvelObject=marvel,
                     CoverageThreshold=10,
                     EventType="A3SS"
                     )
 
marvel <- ComputePSI(MarvelObject=marvel,
                     CoverageThreshold=10,
                     EventType="AFE"
                     )
    
marvel <- ComputePSI(MarvelObject=marvel,
                     CoverageThreshold=10,
                     EventType="ALE"
                     )

(4)保存中间结果

  • 首先检查不同矩阵与对应的meta行列名信息一致
marvel <- CheckAlignment(MarvelObject=marvel, level="SJ")
marvel <- CheckAlignment(MarvelObject=marvel, level="splicing")
marvel <- CheckAlignment(MarvelObject=marvel, level="gene")
marvel <- CheckAlignment(MarvelObject=marvel, level="splicing and gene")
  • 可进一步选取感兴趣的细胞群
index.1 <- which(df.pheno$cell.type %in% c("iPSC", "Endoderm"))
index.2 <- which(df.pheno$qc.seq=="pass")
index <- intersect(index.1, index.2)
sample.ids <- df.pheno[index, "sample.id"]
marvel <- SubsetSamples(MarvelObject=marvel,
                        sample.ids=sample.ids
                        )
  • 保存中间结果,方便后续分析
save(marvel, file="./MARVEL2.RData")

# 后续以官方提供的示例数据学习
# load("../Data/MARVEL.RData")

2.3 细胞群AS特征分析

以其中84个iPSC细胞群进行演示

sample.ids = marvel$SplicePheno  %>% 
    dplyr::filter(cell.type=="iPSC") %>% 
    dplyr::pull(sample.id) 

(1)AS类型占比

marvel <- CountEvents(MarvelObject=marvel,
                      sample.ids=sample.ids,
                      min.cells=25
                      )
# 对于一个基因至少有25个细胞的AS PSI值为非缺失值,则判定发生了可变剪切事件
marvel$N.Events$Table
#   event_type freq        pct
# 1         SE 5270 40.1523810
# 3         RI 2030 15.4666667
# 6        AFE 1789 13.6304762
# 5       A3SS 1746 13.3028571
# 4       A5SS 1494 11.3828571
# 7        ALE  724  5.5161905
# 2        MXE   72  0.5485714
marvel$N.Events$Plot 
# ggplot绘图体系,下同

(2)AS modality

判断细胞群中每个基因的每种可变剪切的PSI分布类型,如下可分为7类

marvel <- AssignModality(MarvelObject=marvel,
                         sample.ids=sample.ids,
                         min.cells=25,
                         seed=1
                         )
marvel$Modality$Results[1:5, c("tran_id", "event_type", "gene_id", "gene_short_name", "modality.bimodal.adj")]
table(marvel$Modality$Results$modality.bimodal.adj, marvel$Modality$Results$event_type)
#                    A3SS A5SS  AFE  ALE  MXE   RI   SE
# Bimodal              20    7    4    6    0    4   17
# Excluded.Dispersed  386  484  538  189   18  783 1260
# Excluded.Primary    289  282  477  101   22  712 1049
# Included.Dispersed  563  385  372  237   12  352 1685
# Included.Primary    439  291  346  176   17   47 1184
# Middle                9   10    9    3    2   51   15
# Multimodal           40   35   43   12    1   81   60
  • 后续有两种统计方式:仅按modality分,同时按modality与AS type分。以后者作为演示
marvel <- PropModality(MarvelObject=marvel,
                       modality.column="modality.bimodal.adj",
                       modality.type="extended",
                       event.type=c("SE", "MXE", "RI", "A5SS", "A3SS", "AFE", "ALE"),
                       across.event.type=TRUE,
                       prop.test="chisq",
                       prop.adj="fdr",
                       xlabels.size=8
                       )
marvel$Modality$Prop$BarChart$Stats
head(marvel$Modality$Prop$BarChart$Table)
marvel$Modality$Prop$BarChart$Plot

2.4 细胞群差异分析

(1)差异基因分析

cell.group.g1 = marvel$SplicePheno  %>% 
    dplyr::filter(cell.type=="iPSC") %>% 
    dplyr::pull(sample.id)  # (reference)
cell.group.g2 = marvel$SplicePheno  %>% 
    dplyr::filter(cell.type=="Endoderm") %>% 
    dplyr::pull(sample.id)

marvel <- CompareValues(MarvelObject=marvel,
                        cell.group.g1=cell.group.g1,
                        cell.group.g2=cell.group.g2,
                        min.cells=3,
                        method="wilcox",
                        method.adjust="fdr",
                        level="gene",
                        show.progress=FALSE
                        )
marvel$DE$Exp$Table[1:5, ]

(2)差异AS分析

# time consuming
marvel <- CompareValues(MarvelObject=marvel,
                        cell.group.g1=cell.group.g1,
                        cell.group.g2=cell.group.g2,
                        min.cells=25,
                        method=c("ad", "dts"),  #同时使用两种的推荐差异方法
                        method.adjust="fdr",
                        level="splicing",
                        event.type=c("SE", "MXE", "RI", "A5SS", "A3SS", "ALE", "AFE"),
                        show.progress=FALSE
                        )
names(marvel$DE$PSI$Table)
head(marvel$DE$PSI$Table[["ad"]])
head(marvel$DE$PSI$Table[["dts"]])

(3)差异基因&AS分析

  • 对差异可变剪切的基因执行基因表达差异分析
marvel <- CompareValues(MarvelObject=marvel,
                        cell.group.g1=cell.group.g1,
                        cell.group.g2=cell.group.g2,
                        psi.method=c("ad", "dts"),
                        psi.pval=c(0.10, 0.10),
                        psi.delta=0,
                        method.de.gene="wilcox",
                        method.adjust.de.gene="fdr",
                        downsample=FALSE,
                        show.progress=FALSE,
                        level="gene.spliced"
                        )
head(marvel$DE$Exp.Spliced$Table)

对于上述差异分析结果均可快速火山图,标记marker基因等

2.5 差异AS进阶分析

(1)差异PSI分类

  • Explicit:明显的变化;Implicit:有限的变化;Restricted:微小的变化
marvel <- ModalityChange(MarvelObject=marvel,
                     method=c("ad", "dts"),
                     psi.pval=c(0.10, 0.10)
                     )
table(marvel$DE$Modality$Table$modality.change)
# Explicit   Implicit Restricted 
#    161        280       1113
head(marvel$DE$Modality$Table)
marvel$DE$Modality$Plot

cell.group.list <- list("iPSC"=cell.group.g1,
                        "Endoderm"=cell.group.g2
                        )
tran_id <- "chr3:129171771:129171634|129171655:-@chr3:129171446:129171538"
# tran_id <- "chr6:21596763:21598248:+@chr6:21598321:21599378"
# tran_id <- "chr3:109337464:109337652:-@chr3:109333920|109333993:109333870"
marvel <- PlotValues(MarvelObject=marvel,
                     cell.group.list=cell.group.list,
                     feature=tran_id,
                     xlabels.size=5,
                     level="splicing",
                     min.cells=25
                     )
marvel$adhocPlot$PSI 

(2)差异基因与AS

  • Coordinated:基因高表达,相应的PSI也增高;
  • Opposing:基因表达与PSI变化方向相反;
  • Isoform-switching :基因表达不变,PSI发生变化
  • Complex:其它复杂情况
marvel <- IsoSwitch(MarvelObject=marvel,
                    method=c("ad", "dts"),
                    psi.pval=c(0.10, 0.10),
                    psi.delta=0,
                    gene.pval=0.10,
                    gene.log2fc=0.5
                    )

marvel$DE$Cor$Plot
head(marvel$DE$Cor$Table_Raw)
table(marvel$DE$Cor$Table$cor)
# Coordinated    Opposing  Iso-Switch     Complex 
#         192         158         331         113

## 以Coordinated的DHX9与相应的AFE剪切事件为例
# gene
df.feature <- marvel$GeneFeature
gene_id <- df.feature[which(df.feature$gene_short_name=="DHX9"), "gene_id"]
marvel <- PlotValues(MarvelObject=marvel,
                   cell.group.list=cell.group.list,
                   feature=gene_id,
                   maintitle="gene_short_name",
                   xlabels.size=7,
                   level="gene"
                   )
plot.1_gene <- marvel$adhocPlot$Exp
# Splicing
tran_id <- "chr1:182839347:182839456|182839734:182839935:+@chr1:182842545:182842677"
marvel <- PlotValues(MarvelObject=marvel,
                   cell.group.list=cell.group.list,
                   feature=tran_id,
                   xlabels.size=7,
                   level="splicing",
                   min.cells=25
                   )
plot.1_splicing <- marvel$adhocPlot$PSI
library(patchwork)
plot.1_gene + plot.1_splicing

(3)NMD预测

  • nonsense-mediated decay (NMD)无义介导的mRNA降解预测
  • 相比对照组,观测组所剪切的外显子可能由于包含premature stop codons (PTCs),导致编码提前结束,从而进一步使得表达量降低。
  • 本质上是对可产生PTC的可变剪切与下调基因取交集。
marvel <- ParseGTF(MarvelObject=marvel)
marvel <- FindPTC(MarvelObject=marvel,
                  method=c("ad", "dts"),
                  pval=c(0.10, 0.10),
                  delta=5
                  )

marvel <- PropPTC(MarvelObject=marvel,
                  xlabels.size=8,
                  show.NovelSJ.NoCDS=FALSE, #仅展示Non PTC与PTC
                  prop.test="chisq"
                  )
marvel$NMD$PTC.Prop$Table[1:5, ]

marvel <- CompareExpr(MarvelObject=marvel,
                      xlabels.size=8
                      )
marvel$NMD$NMD.Expr$Plot

marvel <- AnnoVolcanoPlot(MarvelObject=marvel,
                          anno=FALSE,
                          xlabel.size=7
                          )

marvel$NMD$AnnoVolcanoPlot$Plot
marvel$NMD$AnnoVolcanoPlot$Table

未完待续......

相关推荐

数据中台与业务中台总体技术架构设计方案

《数据中台与业务中台总体技术架构设计方案》提出**“开放、稳定、滋养”三原则**,强调通过统一技术架构与框架破除烟囱式系统,构建**“业务中台+数据中台”闭环体系**。方案主张从单体架...

三分钟摸清楚什么叫前后端分离(什么是前后端分离架构?)

什么叫前后端分离?其实,前后端分离的初衷是为了分离前后端开发人员的职责,解决开发模式的问题。说到底,前后端分离就是将前端视图和后端数据进行分离,这样,后端只需要提供接口(后端数据)给前端,而前端也可以...

刚刚,给学妹普及了登录的两大绝学

今天跟大家聊一个比较基础的话题,就是实现登录的方式有哪些?适合刚入行的朋友。华山之Session绝学Session我们称之为会话控制,是一种在服务器端保持会话状态的解决方案。通俗点来讲就是客户...

6种微服务RPC框架,你知道几个?(grpc是微服务框架吗)

开源RPC框架有哪些呢?一类是跟某种特定语言平台绑定的,另一类是与语言无关即跨语言平台的。跟语言平台绑定的开源RPC框架主要有下面几种。Dubbo:国内最早开源的RPC框架,由阿里巴巴公司...

微服务中,Spring Cloud 有哪些注册中心?

SpringCloud是微服务架构中经常使用的一个框架,它提供了一系列工具来帮助开发者构建和管理分布式系统,而服务注册中心又是微服务架构中一个关键组件。那么,SpringCloud支持哪些注册...

Eureka的自我保护机制(eureka自我保护机制原理)

最近遇到一个问题,服务之间调用报错,显示无法路由到指定服务,但是对应的服务是启动的,查询eureka,结果eureka上显示如下,所有实例均消失,我个人对注册中心并没有什么研究,进行正好借此机会简单总...

eureka、zookepeer、nacos的区别(eureka和nacos哪个更好)

前言随着微服务被各大企业应用在项目中,微服务的框架也被更多人学习和使用,但是大部分情况下都是停留在应用层。一、演变过程1.1服务注册和发现基本概念服务注册:将某个或者某些服务的信息(模块的ip和...

40K+Star!Mall电商实战项目开源,附源码、教程合集

最近看了下我的Github,发现mall项目已经突破40K+Star,有点小激动!记得去年8月的时候mall项目刚过20K+Star,时隔1年多已经增长到了40K+Star。今天跟大家聊聊mall项目...

SpringCloud 常见注册中心的比较(springcloud注册过程)

一、概述springcloud是一个非常优秀的微服务框架,要管理众多的服务,就需要对这些服务进行治理,也就是我们说的服务治理,服务治理的作用就是在传统的rpc远程调用框架中,管理每个服务与每个服务之间...

简单介绍Nacos服务注册中心(nacos注册中心有什么用)

Nacos是阿里开源的一个新框架,在分布式的架构中,Nacos同时扮演着服务注册中心和配置中心的角色。今天主要讲的是Nacos作为服务注册中心。分布式中著名的CAP理论,任何一种服务注册中心都只能实现...

入门注册中心——consul(注册中心怎么注册)

基础概念什么是注册中心随着微服务理论发展的成熟,越来越多互联网公司采用微服务架构来支持业务发展。各个微服务之间都需要通过注册中心来实现自动化的注册和发现。注册中心主要有三种角色:服务提供者(RPCS...

08 Eureka的基础知识(eureka replication)

Eureka是Netflix开发的服务发现框架,SpringCloud将它集成在自己的子项目spring-cloud-netflix中,实现SpringCloud的服务发现功能。上图简要描述了Eur...

微服务架构中的服务注册与发现有哪些?Zookeeper、Eu

“大家好,我是码哥,《Redis高手心法》作者,本章节选自《Java面试高手心法58讲》专栏。随着单体应用的拆分,我们面临的首要问题就是采用哪种方式实现服务间的调用,像之前单体应用可能直接在配...

Eureka 都挂了,微服务还能调通吗?

如果你做过微服务开发,这个面试题应该能够立马答出来,如果你没做过微服务开发,但是学过一些SpringCloud组件的用法,这个问题可能要稍微想一下,但是也应该能够答出来。今天就来和大家说说这个问...

ZooKeeper、Eureka、Consul 、Nacos微服务注册中心对比

注册中心前言服务注册中心本质上是为了解耦服务提供者和服务消费者。对于任何一个微服务,原则上都应存在或者支持多个提供者,这是由微服务的分布式属性决定的。更进一步,为了支持弹性扩缩容特性,一个微服务的提供...