Newslog • SPATA2

To update your object to be suitable for the newest version, just run:


# the output is an object fit for the version of SPATA2 you have installed
my_object <- updateSpataObject(object = my_object)

SPATA2 v3.1.0

SPATA2 v3.1.0 comes with the following major changes and fixes.

Changes:

Fixes spatialTrajectoryScreening() in case of missing capture areas.
Improved precision in computing the capture area of Visium Experiments.
Fixes bad polygon display in case of ggpLayerSpatAnnOutline() with expand_outline = 0.
Improved options and comfort in createSpatialSegmentation().

Additions:

reduceResolutionVisiumHD() to create SPATA2 objects with individual VisiumHD resolution beyond the standard (16um, 8um and 2um).
resizeImage() for memory friendly handling of large images.

SPATA2 v2.0.4 -> v3.0.0

With the publication of our manuscript Kueckelhaus et al. 2024 we are delighted to release Version 3.0.0 of SPATA2. With it come several changes and improvements regarding the analysis approach to gene expression as gradients as well as S4 object structure, function and arguments naming as well as compatibility with platforms beyond Visium.

S4 architecture

Similar to Seurat, SPATA2 is centered around an S4 object that contains data and tracks progress in preprocessing and downstream analysis. In version 3.0.0, the object architecture has been significantly revised, introducing a new S4 class. The previous spata2 class, designed in 2019, has been replaced by the new SPATA2 class. This update includes new and renamed slots, and the spatial multi-omic study components are now more elegantly divided into new S4 classes, inspired partly by Seurat’s structure (e.g., creating an S4 object for each raw count matrix as an MolecularAssay object).

Understanding the exact architecture of the SPATA2 objects is complex and generally unnecessary. However, detailed documentation is available for those interested. Each class and its subclasses are extensively documented. Use the question mark operator like this ?SPATA2, ?MolecularAssay, ?SpatialData to get more details, or visit the S4-classes section on the reference page.

Platform support

SPATA2 v3.0.0 offers enhanced support for platforms like MERFISH, SlideSeq, Xenium, and more. The new S4 structure and classes SPATA2 uses provide a clearer framework for creating individual SPATA2 objects with specific spatial data sets. For each platform, we have created specialized tutorials for initiation and suggested processing steps. Note that all initiateSpataObject_*() functions from version 2.0.4 have been deprecated in favor of new initiation functions. Please refer to the respective tutorials via the dropdown button: Articles >> Object Initiation & Processing.

Deprecated functions, renaming and new naming conventions

“There are only two hard things in Computer Science: cache invalidation and naming things.” - Phil Carlton

With the improvements made to SPATA2 v3.0.0, multiple aspects and concepts have been renamed or the naming has been expanded. This has consequences for the function naming and thus for the code. While we deprecated multiple functions, we did so in a way that it automatically calls the new function in the background and prints out a warning that the function you just used is no longer in use and likely to disappear in the near future.


## the term expression matrix has been abandoned in favor of processed matrix
# this function throws a warning...
getExpressionMatrix(object, mtr_name = "LogNormalize")[1:5, 1:5]
## Warning in confuns::give_feedback(msg = msg, fdb.fn = fdb_fn, with.time =
## FALSE): Function `getExpressionMatrix()` is deprecated and will be deleted in
## the near future. Please use `getProcessedMatrix() or getMatrix()` instead.
## 5 x 5 sparse Matrix of class "dgCMatrix"
##            AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1 AAACACCAATAACTGC-1
## AL627309.1                  .                  .                  .
## AL627309.3                  .                  .                  .
## AL669831.5                  .                  .                  .
## FAM87B                      .                  .                  .
## LINC00115                   .                  .                  .
##            AAACAGAGCGACTCCT-1 AAACAGCTTTCAGAAG-1
## AL627309.1                  .                  .
## AL627309.3                  .                  .
## AL669831.5                  .                  .
## FAM87B                      .                  .
## LINC00115                   .                  .

# ... and calls this one right after - your code should still work
getProcessedMatrix(object, mtr_name = "LogNormalize")[1:5, 1:5]
## 5 x 5 sparse Matrix of class "dgCMatrix"
##            AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1 AAACACCAATAACTGC-1
## AL627309.1                  .                  .                  .
## AL627309.3                  .                  .                  .
## AL669831.5                  .                  .                  .
## FAM87B                      .                  .                  .
## LINC00115                   .                  .                  .
##            AAACAGAGCGACTCCT-1 AAACAGCTTTCAGAAG-1
## AL627309.1                  .                  .
## AL627309.3                  .                  .
## AL669831.5                  .                  .
## FAM87B                      .                  .
## LINC00115                   .                  .

However, multiple changes have been made and there might be code you wrote in the past that does not work any longer with SPATA2 v3.0.0. We apologize for that. If the automatically generated warning messages do not guide you or the function you want to use simply does not exist any longer, please raise an issue including the code of SPATA2 v2.0.4 and we will help you to write the code to obtain the same results with SPATA2 v3.0.0.

Active and inactive

The new SPATA2 object facilitates handling multiple data modalities. Often, the SPATA2 object contains more than one image (e.g., lowres and hires) or more than one data matrix (e.g. raw counts, log normalized counts, scaled). Functions that access data must choose one by default. Whenever there are two or more options, the function will default to what has been declared as active by the activate*() functions. The current active state can be checked with the corresponding active*() functions.

In previous versions, there was no uniform naming convention for functions dealing with activation. Now, there is. Consequently, functions like setActive*() or getActive*() are deprecated.


# names of all registered images
getImageNames(object)
## [1] "lowres" "hires"

# currently active image is "lowres"
activeImage(object)
## [1] "lowres"

# activate different image
object <- activateImage(object, img_name = "hires")
## 02:04:51 No image directory found and/or the directory does not exist on this device. Did not unload image lowres.
## 02:04:51 Loading image hires.
## 02:04:51 Active image: 'hires'.

# currently active image has changed to "hires"
activeImage(object)
## [1] "hires"

Spatial Annotation Screening (previously Image Annotation Screening)

The concept has been expanded to include screening based on annotations that highlight not only image-related histomorphological features but also areas with very high (or very low) expression of numeric features or specific clusters. Image annotations are now part of the broader term Spatial Annotations, which includes Image Annotations, Numeric Annotations, and Group Annotations. Each type provides high-level functions to create necessary spatial reference outlines for gradient-based analysis. Consequently, Image Annotation Screening (IAS) has been renamed to Spatial Annotation Screening (SAS). This affects every function, changing Ias to Sas, and binwidth to resolution. See our new tutorial here.

Additionally, the mathematical approach for obtaining gradients has changed during the review process. This includes how distance to the spatial outline is integrated, allowing the integration of multiple annotations of the same kind (see Main Figure 4a-b of Kueckelhaus et al. 2024). The parameter binwidth has been replaced by resolution, with the input remaining the same. The term binwidth no longer fits the approach, hence the change. For a detailed description of how gradients are obtained and p-values computed, refer to the methods section and supplementary figures of the manuscript Kueckelhaus et al. 2024.


## the algorithm
# previously (now deprecated)
ias_out <- imageAnnotationScreening(object, ids = c("first_id", "sec_id"), binwidth = "100um", ...)

# now
sas_out <- spatialAnnotationScreening(object, ids = c("first_id", "sec_id"), resolution = "100um", ...)

## plotting functions
# previously (now deprecated)
plotIasLineplot(object, ids  = c("first_id", "second_id"), variables = "EGFR", binwidth = "100um")

# now
plotSasLineplot(object, ids = c("first_id", "second_id"), variables = "EGFR", resolution = "100um")

Apart from the renaming of binwidth to resolution, Spatial Trajectory Screening is largely unaffected by these changes. The methodology for calculating gradients based on spatial trajectories has been adjusted accordingly, but other than the binwidth to resolution change, not much has been altered. Furthermore, the overall concept of inferring expression gradients related to spatial reference features (Spatial Trajectories or Spatial Annotations) is now termed Spatial Gradient Screening.

Molecular assays

The core of any spatial transcriptomcis data set is a raw integer matrix of mRNA counts with rows that correspond to gene names and columns corresponding to observation identifiers - in SPATA2 terms barcodes. This count matrix can be processed in multiple different ways and downstream analysis can follow multiple steps. While gene expression is of paramount importance in spatial omic studies, spatial analysis can also focus on or integrate metabolomics and proteomic data. Hence, we expanded the naming and the concept to molecules to allow future integration of these kinds of data, too. The assay name corresponds to the data modality of the assay. SPATA2 knows gene, protein and metabolite.


## assume that you have a SPATA2 object from the Visium platform
# and you vertically integrated protein expression data 

# works as usual
gene_names <- getGenes(object)

# results in the same output 
gene_names <- getMolecules(object, assay_name = "gene")

# hypothetical data set contains also an assay of spatially resolved protein data
protein_names <- getProteins(object)

protein_names <- getMolecules(object, assay_name = "protein")

Molecular data matrices used to be stored in a loose list in slot @data. Within the new SPATA2 object, molecular data is now structured in objects of class MolecularAssay (partly inspired by the Seurat Assay-class). Among other things these objects contain a slot called @mtr_counts which contains the raw count matrix and a slot called @mtr_proc, a named list of matrices that resulted from different processing steps. Hence, SPATA2 differentiates between raw count matrix as obtained via getCountMatrix() and processed matrices as obtained via getProcessedMatrix(). In previous versions of SPATA2, we used the term expression matrix to refer to processed gene expression matrices. This naming convention has been abandoned in favor of raw counts vs. processed.


## extraction of raw coutns remains the same apart from the new assay_name argument which defaults
# to the name of the active assay as obtained by activeAssay(object)
count_mtr <- getCountMatrix(object)

## extract a normalized matrix
# previously (now deprecated)
norm_mtr <- getExpressionMatrix(object, mtr_name = "LogNormalize")
## Warning in confuns::give_feedback(msg = msg, fdb.fn = fdb_fn, with.time =
## FALSE): Function `getExpressionMatrix()` is deprecated and will be deleted in
## the near future. Please use `getProcessedMatrix() or getMatrix()` instead.

# now
norm_mtr <- getProcessedMatrix(object, mtr_name = "LogNormalize")

Analysis results that are directly linked to the molecular data such as SPARKX, DEA, GSEA, CNV etc. are stored in slot @analysis of the assay. The functions to extract or plot the data remain the same like getSparkxGeneDf(), getDeaResultsDf(), etc.

Meta data (previously feature data)

Previously, we stored meta variables like clustering and quality control variables in the slot @fdata, short for feature data, based on the idea of “features that are not directly related to molecular counts.” However, in other packages like Seurat or Squidpy, the term feature often refers to numeric variables, including gene expression, causing understandable confusion. To address this, we renamed the slot. As shown in the documentation of the new SPATA2 class, the slot is now called @meta_obs, short for metadata of the SPATA2 object’s observations (barcoded spots, cells, etc.).


# previously (now deprecated)
getFeatureDf(object)
## Warning in confuns::give_feedback(msg = msg, fdb.fn = fdb_fn, with.time =
## FALSE): Function `getFeatureDf()` is deprecated and will be deleted in the near
## future. Please use `getMetaDf()` instead.
## # A tibble: 3,517 × 128
##    barcodes         sample nCount_Spatial nFeature_Spatial percent.mt percent.RB
##    <chr>            <chr>           <dbl>            <int>      <dbl>      <dbl>
##  1 AAACAAGTATCTCCC… T313             4319             2164      2.17        6.54
##  2 AAACAATCTACTAGC… T313             6005             2796      2.10        8.46
##  3 AAACACCAATAACTG… T313              349              254      4.18        7.74
##  4 AAACAGAGCGACTCC… T313            11394             4248      2.85        8.63
##  5 AAACAGCTTTCAGAA… T313              902              616      1.72        6.33
##  6 AAACAGGGTCTATAT… T313              175              144      5.49        6.33
##  7 AAACATGGTGAGAGG… T313             8327             2958      0.870       8.11
##  8 AAACCCGAACGAAAT… T313             2940             1595      2.21        7.21
##  9 AAACCGGGTAGGTAC… T313               96               83      1.57       10.2 
## 10 AAACCGTTCGTCCAG… T313              671              394      1.99        8.18
## # ℹ 3,507 more rows
## # ℹ 122 more variables: Spatial_snn_res.0.8 <fct>, seurat_clusters <fct>,
## #   AC_like <dbl>, AC_like_Prolif <dbl>, MES_like_hypoxia_independent <dbl>,
## #   MES_like_hypoxia_MHC <dbl>, NPC_like_neural <dbl>, NPC_like_OPC <dbl>,
## #   NPC_like_Prolif <dbl>, OPC_like <dbl>, OPC_like_Prolif <dbl>, cDC1 <dbl>,
## #   cDC2 <dbl>, DC1 <dbl>, DC2 <dbl>, DC3 <dbl>, Mast <dbl>,
## #   Mono_anti_infl <dbl>, Mono_hypoxia <dbl>, Mono_naive <dbl>, pDC <dbl>, …

# now 
getMetaDf(object)
## # A tibble: 3,517 × 128
##    barcodes         sample nCount_Spatial nFeature_Spatial percent.mt percent.RB
##    <chr>            <chr>           <dbl>            <int>      <dbl>      <dbl>
##  1 AAACAAGTATCTCCC… T313             4319             2164      2.17        6.54
##  2 AAACAATCTACTAGC… T313             6005             2796      2.10        8.46
##  3 AAACACCAATAACTG… T313              349              254      4.18        7.74
##  4 AAACAGAGCGACTCC… T313            11394             4248      2.85        8.63
##  5 AAACAGCTTTCAGAA… T313              902              616      1.72        6.33
##  6 AAACAGGGTCTATAT… T313              175              144      5.49        6.33
##  7 AAACATGGTGAGAGG… T313             8327             2958      0.870       8.11
##  8 AAACCCGAACGAAAT… T313             2940             1595      2.21        7.21
##  9 AAACCGGGTAGGTAC… T313               96               83      1.57       10.2 
## 10 AAACCGTTCGTCCAG… T313              671              394      1.99        8.18
## # ℹ 3,507 more rows
## # ℹ 122 more variables: Spatial_snn_res.0.8 <fct>, seurat_clusters <fct>,
## #   AC_like <dbl>, AC_like_Prolif <dbl>, MES_like_hypoxia_independent <dbl>,
## #   MES_like_hypoxia_MHC <dbl>, NPC_like_neural <dbl>, NPC_like_OPC <dbl>,
## #   NPC_like_Prolif <dbl>, OPC_like <dbl>, OPC_like_Prolif <dbl>, cDC1 <dbl>,
## #   cDC2 <dbl>, DC1 <dbl>, DC2 <dbl>, DC3 <dbl>, Mast <dbl>,
## #   Mono_anti_infl <dbl>, Mono_hypoxia <dbl>, Mono_naive <dbl>, pDC <dbl>, …

Management of spatial data

SPATA2 focuses on spatial data and integrating multiple spatial analysis approaches. Consequently, spatial data now has its own slot @spatial, previously used for random miscellaneous spatial data, now a clearly structured S4 object called SpatialData. For more details on the class, call ?SpatialData. This new architecture improves the management of spatial aspects and alignment challenges. Here are the most important aspects:

Image

The SPATA2 object allows the storage of multiple images by registering them individually using registerImage(). Each registered image creates a container object of class HistoImage, storing the image or its file directory, along with additional information and data from image processing steps (e.g., tissue outline, alignment).

Working with multiple images, alongside the coordinates of data points and spatial reference features (SpatialAnnotation, SpatialTrajectory), requires careful alignment. Alignment involves matching image resolution and adjusting images for angle, horizontal or vertical translation, and stretching. This is crucial for images of neighboring tissue sections that are similar but not perfectly overlapping.

The Reference Image

To facilitate alignment and integration of multiple images, coordinates, and spatial reference features, a reference image is designated. By default, this is the first image loaded into the SPATA2 object. SPATA2 assumes that coordinates and the reference image align perfectly in terms of vertical and horizontal justification (scaling may still be needed). Aligning additional images to the reference image ensures alignment with the data point coordinates. Additionally, the reference image allows automatic transfer of scale factors to newly registered images.

The Active Image

The active image is the default image used in functions requiring image input, such as createImageAnnotations() or plotSpatialAnnotations(), and for extracting coordinates. During extraction, coordinates are scaled to the resolution of the currently active image. The reference image can also serve as the active image.

Scaling coordinates

SPATA2 objects allow for the simultaneous storage of multiple images, which may differ in resolution, requiring different scale factors to align spatial data coordinates accurately with the images. In version 3.0.0, coordinate variables are differentiated as x_orig, y_orig (original coordinates) and x, y (scaled coordinates).

The rationale: Whenever spatial data (coordinates of barcode spots, coordinates of cells, tissue outlines, spatial annotations, spatial trajectories, etc.) are stored, they are recorded as x_orig and y_orig. Even if you are currently working within the resolution of a different image the coordinates are scaled back to the “original resolution” of your reference image. Upon extraction, a scaling process is applied to any spatial data, such as in getSpatAnnOutlineDf() or getTissueOutlineDf(). This process ensures that coordinates are appropriately scaled to the resolution of the active image. For instance, this scaling occurs in the background when extracting or plotting barcoded spots of a Visium data set in the context of a hypothetical image, lowres.

First, the original data.frame is obtained. `


# `as_is` = TRUE, skips any processing steps
coords_df <- getCoordsDf(object, as_is = TRUE)[1:10, ]
## Warning in check_object(object): SPATA2 object is of version 3.1.0. Current
## SPATA2 version is 3.1.3. Please use `updateSpataObject()`.

coords_df
## # A tibble: 10 × 7
##    barcodes           x_orig y_orig   row   col section          section_dbscan
##    <chr>               <dbl>  <dbl> <int> <int> <fct>            <chr>         
##  1 TAACCGTCCAGTTCAT-1   4002   2342     1    15 tissue_section_1 1             
##  2 CGCGTGCTATCAACGA-1   5189   2340     1    27 tissue_section_1 1             
##  3 TGGTGTGACAGACGAT-1   2716   2516     2     2 tissue_section_1 1             
##  4 ATCTATCGATGATCAA-1   2815   2688     3     3 tissue_section_1 1             
##  5 CGGTAACAAGATACAT-1   2914   2516     2     4 tissue_section_1 1             
##  6 TCGCCGGAGAGTCTTA-1   3013   2688     3     5 tissue_section_1 1             
##  7 GGAGGAGTGTGTTTAT-1   3112   2515     2     6 tissue_section_1 1             
##  8 TTAGGTGTGACTGGTC-1   3211   2687     3     7 tissue_section_1 1             
##  9 CAGGGCTAACGAAACC-1   3310   2515     2     8 tissue_section_1 1             
## 10 CCCGTGGGTTAATTGA-1   3409   2687     3     9 tissue_section_1 1

Second, the image scale factor of image lwores is extracted.


# extract an image scale factor for the image named "lowres"
isf <- getScaleFactor(object, img_name = "lowres", fct_name = "image")

isf
## [1] 0.03460208

Third, the variables are transformed (scaled) to x and y.


# create variables x and y by applying the scale factor to the original coordinates
coords_df$x <- coords_df$x_orig * isf
coords_df$y <- coords_df$y_orig * isf

coords_df[,c("barcodes", "x_orig", "y_orig", "x", "y")]
## # A tibble: 10 × 5
##    barcodes           x_orig y_orig     x     y
##    <chr>               <dbl>  <dbl> <dbl> <dbl>
##  1 TAACCGTCCAGTTCAT-1   4002   2342 138.   81.0
##  2 CGCGTGCTATCAACGA-1   5189   2340 180.   81.0
##  3 TGGTGTGACAGACGAT-1   2716   2516  94.0  87.1
##  4 ATCTATCGATGATCAA-1   2815   2688  97.4  93.0
##  5 CGGTAACAAGATACAT-1   2914   2516 101.   87.1
##  6 TCGCCGGAGAGTCTTA-1   3013   2688 104.   93.0
##  7 GGAGGAGTGTGTTTAT-1   3112   2515 108.   87.0
##  8 TTAGGTGTGACTGGTC-1   3211   2687 111.   93.0
##  9 CAGGGCTAACGAAACC-1   3310   2515 115.   87.0
## 10 CCCGTGGGTTAATTGA-1   3409   2687 118.   93.0

If the object does not contain an image, as with MERFISH or Xenium, the @images slot of the SpatialData object is empty. Without images, there is no need for alignment or image scale factors. In this case, the x and y variables in the data frame will be equal to x_orig and y_orig. This might sound redundant or unnecessary. However, to accommodate potential future options for aligning images of adjacent tissue slides to datasets without original images, the *_orig coordinate structure is maintained, even if no images are present during the initial object creation.