spata-de-analysis.Rmd
Differential expression analysis aims to discover quantitative changes in gene-expression levels between defined experimental groups. In SPATA these experimental groups are defined inside the feature data. More precisely: Every discrete variable that is part of the spata-object’s feature data assigns every sample’s barcode-spot to such an experimental group. This includes all spata-intern generated groups such as segmentation and clustering as well as any other discrete feature of your own extracted analysis that has been added via addFeature()
. See Extract, add & join data on how to do that with ease.
The following tutorial steps will guide you through the spata-intern differential gene expression functions using a previously created segmentation as the experimental groups.
# load packages library(SPATA) library(magrittr) library(ggplot2) # load object spata_obj <- loadSpataObject(input_path = "data/spata-obj-de-analysis-example.RDS") plotSegmentation(spata_obj)
As mentioned in previous chapters SPATA makes use of some well written functions of the Seurat-package which currently define the gold-standard of statistical scRNA-seq analysis. The function findDE()
is a wrapper around Seurat::FindAllMarkers()
and lets you denote the experimental groups across which you want to analyze diferentially expressed genes. You can do that via the across
-argument which we are going to denote as segment as this is the feature we are interested in for now. (If you are only interested in a subset of groups the specified variable contains use the across_subset
-argument to filter for those).
de_results <- findDE(object = spata_obj, across = "segment", # denotes experimental group belonging of interest across_subset = c("oxid-phosph-high", "hypoxic"), # subsets the groups method_de = "wilcox", p_val_adj = 0.05) # output de_results
In order to post-process the resulting de-data.frame use filterDE()
. The data.frame will be sliced in order that for every cluster n-genes are filtered depending on the input of n_highest_FC
and n_lowest_pvalue
.
filtered_de_results <- filterDE(de_df = de_results, n_highest_FC = 100, # keep the 100 genes with the highest log fold change n_lowest_pvalue = 50, # from these 100 genes keep the 50 genes with the lowest p-value return = "data.frame") filtered_de_results
The gold standard of differentially gene expression visualization is the classical heatmap. plotDeHeatmap()
takes your DE-results and plots the respective heatmap by extracting the genes and barcode-spots of interest. Via additional computation the heatmap is segmented into clear and aesthetically pleasing rectangulars. Additional arguments to pheatmap::pheatmap()
can be specified via ...
.
heatmap <- plotDeHeatmap(object = spata_obj, de_df = filtered_de_results, # specify the data across = "segment", # specify the feature across_subset = unique(filtered_de_results$cluster), hm_colors = viridis::inferno(n = 15), # provide your color spectrum of choice breaks = c( c(-5.5, -0.6), seq(-0.5,1.8, length.out = 10), c(1.9, 3)), show_rownames = FALSE) heatmap
In order to visualize only a subset of genes make use of plotDistributionAcross()
which plots the distribution of specific variables across specific subgroups.
genes_of_interest <- filterDE(de_df = de_results, n_highest_FC = 10, n_lowest_pvalue = 10, return = "vector") # obtain only a vector of genes as input for 'variables' plotDistributionAcross(spata_obj, variables = genes_of_interest, across = "segment", across_subset = c("oxid-phosph-high", "hypoxic"), plot_type = "violin") + theme(axis.text.x = element_blank(), legend.position = "top")
Another way to visualize your results would be to deploy plotSurfaceComparison()
.
hypoxic_high_genes <- genes_of_interest <- filterDE(de_df = de_results, n_highest_FC = 10, n_lowest_pvalue = 10, across_subset = "hypoxic", return = "vector") plotSurfaceComparison(spata_obj, variables = hypoxic_high_genes, smooth = TRUE, pt_size = 1)