spata-segmentation-and-clustering.Rmd
Discrete (syn. categorical) variables divide a sample’s barcode-spots into experimental groups that can be compared against each other.
Grouping of barcode-spots can be performed manually by determining spatial regions of interest and naming theme. (See below for more)
This tutorial guides you through the functions SPATA provides that allow for the gouping of barcode-spots.
# load packages library(SPATA) library(magrittr) library(ggplot2) # load object spata_obj <- loadSpataObject(input_path = "data/spata-obj-example-create-segm.RDS")
The spatial dimension of spatial transcriptomics invites to compare regions of interest against each other with respect to their gene expression or other features. The subpart-belonging of every barcode-spot is stored as an additional variable in the feature data. Assuming that you have not created any segments so far your feature data.frame will look like this:
Barcode spots that have not been assigned to a segment feature "" as their simply isn’t a segment to which they belong.
In order to to assign barcode-spots to a segment make use of the createSegmentation()
-function. It opens a mini-shiny application in which you can determine the extent of every segment by determining the vertices of the polygon which determines the segment’s borders.
In order to visualize the current segmentation of your sample run plotSegmentation()
.
# display the current segmentation plotSegmentation(spata_obj, pt_size = 2.5)
The segment belonging is now a usable, discrete variable of your spata-object’s feature data, which can be used in SPATA as any other discrete feature and extracted in a tidy-data fashion as well.
getFeatureVariables(spata_obj, features = "segment", return = "data.frame")
If you need the barcodes of a segment’s barcode spots you can obtain them for example via getCoordinatesSegment()
.
getCoordinatesSegment(spata_obj, of_segment = "hypoxic")
In transcriptomic analysis clustering divides barcode-spots into groups based on the similarity of their gene-expression profiles. There are several algorithms out there that can be used to divide your sample into subgroups. While initiateSpataObject_10X()
integrates the clustering algorithm used by the Seurat-package there actually many more. As the use of clustering is highly depending on the biological question it makes sense to use several approaches.
The package Monocle3 makes use of louvain- and leidenbase clustering with the cluster_cells()
-function. This function takes several additional parameters that can be used to tweak the clustering-algorithm’s performance with respect to the properties of every individual sample. SPATA’s findMonocleClusters()
is a wrapper around Monocle3’s clustering options. It iterates over the parameters you provide and returns a tidy spata data.frame containing each result as a seperate variable.
monocle_clusters <- findMonocleClusters(object = spata_obj, preprocess_method = "PCA", reduction_method = c("UMAP", "PCA", "tSNE"), cluster_method = c("leiden", "louvain"), k = 5, num_iter = 5) # output monocle_clusters
Use the helper function examineClusterResults()
to quickly see all unique cluster names and to evaluate which clustering results come into question.
# output examineClusterResults(data = monocle_clusters)
## $cluster_leiden_UMAP_k5
## [1] "Cluster 1" "Cluster 5" "Cluster 3" "Cluster 2" "Cluster 4" "Cluster 6"
## [7] "Cluster 7"
##
## $cluster_louvain_UMAP_k5
## [1] "Cluster 29" "Cluster 43" "Cluster 16" "Cluster 27" "Cluster 37"
## [6] "Cluster 14" "Cluster 36" "Cluster 38" "Cluster 17" "Cluster 8"
## [11] "Cluster 24" "Cluster 23" "Cluster 45" "Cluster 44" "Cluster 7"
## [16] "Cluster 40" "Cluster 34" "Cluster 1" "Cluster 32" "Cluster 11"
## [21] "Cluster 21" "Cluster 6" "Cluster 13" "Cluster 42" "Cluster 31"
## [26] "Cluster 41" "Cluster 12" "Cluster 20" "Cluster 26" "Cluster 2"
## [31] "Cluster 4" "Cluster 28" "Cluster 25" "Cluster 18" "Cluster 46"
## [36] "Cluster 39" "Cluster 30" "Cluster 33" "Cluster 35" "Cluster 10"
## [41] "Cluster 15" "Cluster 9" "Cluster 3" "Cluster 19" "Cluster 5"
## [46] "Cluster 22"
##
## $cluster_leiden_PCA_k5
## [1] "Cluster 1" "Cluster 2"
##
## $cluster_louvain_PCA_k5
## [1] "Cluster 2" "Cluster 6" "Cluster 1" "Cluster 5" "Cluster 3" "Cluster 8"
## [7] "Cluster 9" "Cluster 7" "Cluster 4"
##
## $cluster_leiden_tSNE_k5
## [1] "Cluster 5" "Cluster 3" "Cluster 2" "Cluster 4" "Cluster 1" "Cluster 6"
## [7] "Cluster 7" "Cluster 8"
##
## $cluster_louvain_tSNE_k5
## [1] "Cluster 10" "Cluster 28" "Cluster 6" "Cluster 22" "Cluster 35"
## [6] "Cluster 48" "Cluster 44" "Cluster 41" "Cluster 17" "Cluster 24"
## [11] "Cluster 25" "Cluster 12" "Cluster 23" "Cluster 39" "Cluster 19"
## [16] "Cluster 29" "Cluster 27" "Cluster 11" "Cluster 4" "Cluster 33"
## [21] "Cluster 30" "Cluster 43" "Cluster 37" "Cluster 49" "Cluster 16"
## [26] "Cluster 40" "Cluster 21" "Cluster 13" "Cluster 3" "Cluster 26"
## [31] "Cluster 46" "Cluster 2" "Cluster 32" "Cluster 14" "Cluster 7"
## [36] "Cluster 38" "Cluster 47" "Cluster 9" "Cluster 31" "Cluster 42"
## [41] "Cluster 15" "Cluster 20" "Cluster 18" "Cluster 8" "Cluster 34"
## [46] "Cluster 5" "Cluster 36" "Cluster 45" "Cluster 1"
It seems as if the results ‘cluster_leiden_UMAP_k5’, ‘cluster_leiden_tSNE_k5’, ’cluster_louvain_PCA_k5’ are the only ones that make sense to work with as the other three either result in way to many or way to less groups.
Via addFeatures()
you can add all clustering results to your spata-object simultaneously, which makes them available for every SPATA-function. Each variable in monocle_clusters
represents a possible option to assign barcodes to experimental groups. And since they are stored as individual variables they can be adduced, analyzed and visualized one by one.
# feature names before adding getFeatureNames(spata_obj)
## numeric integer numeric numeric
## "nCount_RNA" "nFeature_RNA" "percent.mt" "percent.RB"
## factor factor character
## "RNA_snn_res.0.8" "seurat_clusters" "segment"
# add the cluster results spata_obj <- addFeatures(object = spata_obj, feature_names = c("cluster_leiden_UMAP_k5", "cluster_leiden_tSNE_k5","cluster_louvain_PCA_k5"), feature_df = monocle_clusters, key = "barcodes") # feature names afterwards getFeatureNames(spata_obj)
## numeric integer numeric
## "nCount_RNA" "nFeature_RNA" "percent.mt"
## numeric factor factor
## "percent.RB" "RNA_snn_res.0.8" "seurat_clusters"
## character character character
## "segment" "cluster_leiden_UMAP_k5" "cluster_louvain_PCA_k5"
## character
## "cluster_leiden_tSNE_k5"
Continue by visualizing your results or by investigating their properties (e.g. differentially expressed genes).
plotSurface(object = spata_obj, color_to = "cluster_leiden_UMAP_k5", pt_size = 2.1, pt_clrp = "aaas") + labs(color = "Leiden UMAP")
plotSurface(object = spata_obj, color_to = "cluster_leiden_tSNE_k5", pt_size = 2.1, pt_clrp = "jco") + labs(color = "Leiden tSNE")
plotSurface(spata_obj, color_to = "cluster_louvain_PCA_k5", pt_size = 2.1, pt_clrp = "nejm") + labs(color = "Louvain PCA")