spata-terminology.Rmd
In order to make working with the SPATA-package as easy and as intuitive as possible it’s functions are divided into families with respect to what they do. Their main functionality is described by the first verb their name is composed of which means that you can easily skim through all of them by typing SPATA::
into your console.
createSegmentation()
)getCoordinates()
)joinWithGeneSets()
)addFeature()
, addGeneSet()
)plotSurface()
)addFeature()
. (e.g. findDE()
)compileCellDataSet()
for monocle3).As every function accomplishes something unique every function has it’s own special arguments. Still, we attempted to create argument-families that usually refer to the same aspects.
object
takes your spata-object from which it gets all the information. It is the most frequent argument used and usually the first one of every function that needs it.As SPATA draws it’s main power from the tidyverse data.frames are a vital part of the communication between functions and of the framework we attempt to provide. Whenever a function takes a data.frame as an input it will contain an argument of the df
-family.
spata_df
refers to data.frames that contain at least the variables barcodes and sample (obtained by e.g. getSpataDf()
*).coords_df
refers to coordinate data.frames which contain at least the variables barcodes, sample, x and y (obtained by e.g.getCoordinates()
*).de_df
refers to differential expression data.frames which contain the results of differential gene expression analysis. (obtained by e.g. findDE()
).stdf
refers to summarized trajectory data.frames which contain information about spatial trajectories drawn with createTrajectories()
(obtained by e.g.getTrajectoryDf()
).atdf
refers to assessd trajectory data.frames which contain the results spatial trajectory modelling. (obtained by e.g. assessTrajectoryTrends()
)df
is used if the function does not require the data.frame to contain specific variables or if it’s requirements are unique.These are variables of the data.frame you specify as input of the df
-argument family or of the data.frame that is generated from scratch in the background. Depending on what the function does they come along with different names but work in a similar fashion.
color_to
refers to the ggplot2-syntax in which informative variables such as seurat_clusters are mapped onto aesthetics of plots such as color. They take a character value as input denoting the gene-, gene-set or feature of interest.
variable
refers to all types informative variables as well and is used if the function’s output is not a plot.
variables
refer to all types of informative variables as well and is used if more than one variable can be specified.
genes
/gene_sets
/features
take only input of the respective type.
(See Genes-, Gene-sets & Features for more information.)
across
takes one categorical feature that denotes group belonging such as clusters or segments and is found in functions that compare certain aspects across these groups.
across_subset
takes specific values of the specified categorical feature variable in across
.
E.g. findeDE()
with arguments across
= “seurat_clusters” and across_subset
= c(“0”, “1”, “2”) would look for differentially expressed genes across seurat-clusters 0, 1 and 2.
Behind the scenes surface plots are scatterplots where the barcode-spots x- and y-coordinates are mapped onto the respective x- and y-aesthetic of the plot. These dots representing the barcode spots can be visually customized with the pt_*
-arguments.
E.g. plotSurface()
with arguments pt_size
= 2 and pt_clrsp
= ‘inferno’ would generate a surfaceplot with dots of size 2 and with the colorspectrum ‘inferno’ used to display a continuous informative variable specified by color_to
such as ‘GFAP’ expression levels.