SPATA Terminology • SPATA

In order to make working with the SPATA-package as easy and as intuitive as possible it’s functions are divided into families with respect to what they do. Their main functionality is described by the first verb their name is composed of which means that you can easily skim through all of them by typing SPATA:: into your console.

Functions

create-functions make you create new content and then add it immediately to your spata-object. They return an updated version of your spata-object. (e.g. createSegmentation())
get-functions extract data from your spata-object in a tidy data fashion as explained above in order to allow for personal analysis beyond what SPATA currently offers (e.g. getCoordinates())
joinWith-functions take extracted data.frames and join information to them via the barcodes-variable. (e.g. joinWithGeneSets())
add-functions add content (e.g. from your personal analysis) to your spata-object via key-variables to make it available for all SPATA-intern functions. They return an updated version of your spata-object. (e.g. addFeature(), addGeneSet())
plot-functions plot content. They return - in almost all cases - a ggplot-object that can be further customized according to the rules of the ggplot2-framework. (e.g. plotSurface())
find & calculate-functions attempt to find cluster, patterns etc. which can eventually be added as new features via addFeature(). (e.g. findDE())
compile-functions assemble data.frames or objects of other analysis-toolkits from the data of your spata-object. (e.g. compileCellDataSet() for monocle3).

Arguments

As every function accomplishes something unique every function has it’s own special arguments. Still, we attempted to create argument-families that usually refer to the same aspects.

1. SPATA-object

object takes your spata-object from which it gets all the information. It is the most frequent argument used and usually the first one of every function that needs it.

2. Data input

As SPATA draws it’s main power from the tidyverse data.frames are a vital part of the communication between functions and of the framework we attempt to provide. Whenever a function takes a data.frame as an input it will contain an argument of the df-family.

spata_df refers to data.frames that contain at least the variables barcodes and sample (obtained by e.g. getSpataDf()*).

coords_df refers to coordinate data.frames which contain at least the variables barcodes, sample, x and y (obtained by e.g.getCoordinates()*).

de_df refers to differential expression data.frames which contain the results of differential gene expression analysis. (obtained by e.g. findDE()).

stdf refers to summarized trajectory data.frames which contain information about spatial trajectories drawn with createTrajectories() (obtained by e.g.getTrajectoryDf()).

atdf refers to assessd trajectory data.frames which contain the results spatial trajectory modelling. (obtained by e.g. assessTrajectoryTrends())

df is used if the function does not require the data.frame to contain specific variables or if it’s requirements are unique.

3. Informative variables

These are variables of the data.frame you specify as input of the df-argument family or of the data.frame that is generated from scratch in the background. Depending on what the function does they come along with different names but work in a similar fashion.

color_to refers to the ggplot2-syntax in which informative variables such as seurat_clusters are mapped onto aesthetics of plots such as color. They take a character value as input denoting the gene-, gene-set or feature of interest.
variable refers to all types informative variables as well and is used if the function’s output is not a plot.
variables refer to all types of informative variables as well and is used if more than one variable can be specified.
genes/gene_sets/features take only input of the respective type.

(See Genes-, Gene-sets & Features for more information.)

4. Comparing arguments

across takes one categorical feature that denotes group belonging such as clusters or segments and is found in functions that compare certain aspects across these groups.
across_subset takes specific values of the specified categorical feature variable in across.

E.g. findeDE() with arguments across = “seurat_clusters” and across_subset = c(“0”, “1”, “2”) would look for differentially expressed genes across seurat-clusters 0, 1 and 2.

5. Plotting

Behind the scenes surface plots are scatterplots where the barcode-spots x- and y-coordinates are mapped onto the respective x- and y-aesthetic of the plot. These dots representing the barcode spots can be visually customized with the pt_*-arguments.

E.g. plotSurface()with arguments pt_size = 2 and pt_clrsp = ‘inferno’ would generate a surfaceplot with dots of size 2 and with the colorspectrum ‘inferno’ used to display a continuous informative variable specified by color_to such as ‘GFAP’ expression levels.