newslog.Rmd
To update your object to be suitable for the newest version, just run:
# the output is an object fit for the version of SPATA2 you have installed
my_object <- updateSpataObject(object = my_object)
SPATA2 v3.1.0 comes with the following major changes and fixes.
Changes:
spatialTrajectoryScreening()
in case of missing
capture areas.ggpLayerSpatAnnOutline()
with
expand_outline = 0
.createSpatialSegmentation()
.Additions:
reduceResolutionVisiumHD()
to create SPATA2 objects
with individual VisiumHD resolution beyond the standard (16um, 8um and
2um).resizeImage()
for memory friendly handling of large
images.With the publication of our manuscript Kueckelhaus et al. 2024 we are delighted to release Version 3.0.0 of SPATA2. With it come several changes and improvements regarding the analysis approach to gene expression as gradients as well as S4 object structure, function and arguments naming as well as compatibility with platforms beyond Visium.
Similar to Seurat, SPATA2
is centered around an S4
object that contains data and tracks progress in preprocessing and
downstream analysis. In version 3.0.0, the object architecture has been
significantly revised, introducing a new S4 class. The previous
spata2
class, designed in 2019, has been replaced by the
new SPATA2
class. This update includes new and renamed
slots, and the spatial multi-omic study components are now more
elegantly divided into new S4 classes, inspired partly by Seurat’s
structure (e.g., creating an S4 object for each raw count matrix as an
MolecularAssay
object).
Understanding the exact architecture of the SPATA2
objects is complex and generally unnecessary. However, detailed
documentation is available for those interested. Each class and its
subclasses are extensively documented. Use the question mark operator
like this ?SPATA2
, ?MolecularAssay
,
?SpatialData
to get more details, or visit the S4-classes
section on the reference page.
SPATA2 v3.0.0 offers enhanced support for platforms like MERFISH,
SlideSeq, Xenium, and more. The new S4 structure and classes SPATA2 uses
provide a clearer framework for creating individual SPATA2
objects with specific spatial data sets. For each platform, we have
created specialized tutorials for initiation and suggested processing
steps. Note that all initiateSpataObject_*()
functions from
version 2.0.4 have been deprecated in favor of new initiation functions.
Please refer to the respective tutorials via the dropdown button:
Articles >> Object Initiation & Processing.
“There are only two hard things in Computer Science: cache invalidation and naming things.” - Phil Carlton
With the improvements made to SPATA2 v3.0.0, multiple aspects and concepts have been renamed or the naming has been expanded. This has consequences for the function naming and thus for the code. While we deprecated multiple functions, we did so in a way that it automatically calls the new function in the background and prints out a warning that the function you just used is no longer in use and likely to disappear in the near future.
## the term expression matrix has been abandoned in favor of processed matrix
# this function throws a warning...
getExpressionMatrix(object, mtr_name = "LogNormalize")[1:5, 1:5]
## Warning in confuns::give_feedback(msg = msg, fdb.fn = fdb_fn, with.time =
## FALSE): Function `getExpressionMatrix()` is deprecated and will be deleted in
## the near future. Please use `getProcessedMatrix() or getMatrix()` instead.
## 5 x 5 sparse Matrix of class "dgCMatrix"
## AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1 AAACACCAATAACTGC-1
## AL627309.1 . . .
## AL627309.3 . . .
## AL669831.5 . . .
## FAM87B . . .
## LINC00115 . . .
## AAACAGAGCGACTCCT-1 AAACAGCTTTCAGAAG-1
## AL627309.1 . .
## AL627309.3 . .
## AL669831.5 . .
## FAM87B . .
## LINC00115 . .
# ... and calls this one right after - your code should still work
getProcessedMatrix(object, mtr_name = "LogNormalize")[1:5, 1:5]
## 5 x 5 sparse Matrix of class "dgCMatrix"
## AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1 AAACACCAATAACTGC-1
## AL627309.1 . . .
## AL627309.3 . . .
## AL669831.5 . . .
## FAM87B . . .
## LINC00115 . . .
## AAACAGAGCGACTCCT-1 AAACAGCTTTCAGAAG-1
## AL627309.1 . .
## AL627309.3 . .
## AL669831.5 . .
## FAM87B . .
## LINC00115 . .
However, multiple changes have been made and there might be code you wrote in the past that does not work any longer with SPATA2 v3.0.0. We apologize for that. If the automatically generated warning messages do not guide you or the function you want to use simply does not exist any longer, please raise an issue including the code of SPATA2 v2.0.4 and we will help you to write the code to obtain the same results with SPATA2 v3.0.0.
The new SPATA2
object facilitates handling multiple data
modalities. Often, the SPATA2
object contains more than one
image (e.g., lowres and hires) or more than one data matrix (e.g. raw
counts, log normalized counts, scaled). Functions that access data must
choose one by default. Whenever there are two or more options, the
function will default to what has been declared as active by the
activate*()
functions. The current active state can be
checked with the corresponding active*()
functions.
In previous versions, there was no uniform naming convention for
functions dealing with activation. Now, there is. Consequently,
functions like setActive*()
or getActive*()
are deprecated.
# names of all registered images
getImageNames(object)
## [1] "lowres" "hires"
# currently active image is "lowres"
activeImage(object)
## [1] "lowres"
# activate different image
object <- activateImage(object, img_name = "hires")
## 01:12:01 No image directory found and/or the directory does not exist on this device. Did not unload image {object@name}.
## 01:12:01 Loading image hires.
## 01:12:02 Active image: 'hires'.
# currently active image has changed to "hires"
activeImage(object)
## [1] "hires"
The concept has been expanded to include screening based on
annotations that highlight not only image-related histomorphological
features but also areas with very high (or very low) expression of
numeric features or specific clusters. Image annotations are now part of
the broader term Spatial Annotations, which includes Image Annotations,
Numeric Annotations, and Group Annotations. Each type provides
high-level functions to create necessary spatial reference outlines for
gradient-based analysis. Consequently, Image Annotation Screening
(IAS) has been renamed to Spatial Annotation Screening
(SAS). This affects every function, changing Ias
to
Sas
, and binwidth
to resolution
.
See our new tutorial here.
Additionally, the mathematical approach for obtaining gradients has
changed during the review process. This includes how distance to the
spatial outline is integrated, allowing the integration of multiple
annotations of the same kind (see Main Figure 4a-b of Kueckelhaus et
al. 2024). The parameter binwidth
has been replaced by
resolution
, with the input remaining the same. The term
binwidth no longer fits the approach, hence the change. For a detailed
description of how gradients are obtained and p-values computed, refer
to the methods section and supplementary figures of the manuscript
Kueckelhaus et al. 2024.
## the algorithm
# previously (now deprecated)
ias_out <- imageAnnotationScreening(object, ids = c("first_id", "sec_id"), binwidth = "100um", ...)
# now
sas_out <- spatialAnnotationScreening(object, ids = c("first_id", "sec_id"), resolution = "100um", ...)
## plotting functions
# previously (now deprecated)
plotIasLineplot(object, ids = c("first_id", "second_id"), variables = "EGFR", binwidth = "100um")
# now
plotSasLineplot(object, ids = c("first_id", "second_id"), variables = "EGFR", resolution = "100um")
Apart from the renaming of binwidth
to
resolution
, Spatial Trajectory Screening is largely
unaffected by these changes. The methodology for calculating gradients
based on spatial trajectories has been adjusted accordingly, but other
than the binwidth
to resolution
change, not
much has been altered. Furthermore, the overall concept of inferring
expression gradients related to spatial reference features (Spatial
Trajectories or Spatial Annotations) is now termed Spatial Gradient
Screening.
The core of any spatial transcriptomcis data set is a raw integer matrix of mRNA counts with rows that correspond to gene names and columns corresponding to observation identifiers - in SPATA2 terms barcodes. This count matrix can be processed in multiple different ways and downstream analysis can follow multiple steps. While gene expression is of paramount importance in spatial omic studies, spatial analysis can also focus on or integrate metabolomics and proteomic data. Hence, we expanded the naming and the concept to molecules to allow future integration of these kinds of data, too. The assay name corresponds to the data modality of the assay. SPATA2 knows gene, protein and metabolite.
## assume that you have a SPATA2 object from the Visium platform
# and you vertically integrated protein expression data
# works as usual
gene_names <- getGenes(object)
# results in the same output
gene_names <- getMolecules(object, assay_name = "gene")
# hypothetical data set contains also an assay of spatially resolved protein data
protein_names <- getProteins(object)
protein_names <- getMolecules(object, assay_name = "protein")
Molecular data matrices used to be stored in a loose list in slot
@data
. Within the new SPATA2
object, molecular
data is now structured in objects of class MolecularAssay
(partly inspired by the Seurat Assay
-class). Among other
things these objects contain a slot called @mtr_counts
which contains the raw count matrix and a slot called
@mtr_proc
, a named list of matrices that resulted from
different processing steps. Hence, SPATA2 differentiates between raw
count matrix as obtained via getCountMatrix()
and processed
matrices as obtained via getProcessedMatrix()
. In previous
versions of SPATA2, we used the term expression matrix to refer to
processed gene expression matrices. This naming convention has been
abandoned in favor of raw counts vs. processed.
## extraction of raw coutns remains the same apart from the new assay_name argument which defaults
# to the name of the active assay as obtained by activeAssay(object)
count_mtr <- getCountMatrix(object)
## extract a normalized matrix
# previously (now deprecated)
norm_mtr <- getExpressionMatrix(object, mtr_name = "LogNormalize")
## Warning in confuns::give_feedback(msg = msg, fdb.fn = fdb_fn, with.time =
## FALSE): Function `getExpressionMatrix()` is deprecated and will be deleted in
## the near future. Please use `getProcessedMatrix() or getMatrix()` instead.
# now
norm_mtr <- getProcessedMatrix(object, mtr_name = "LogNormalize")
Analysis results that are directly linked to the molecular data such
as SPARKX, DEA, GSEA, CNV etc. are stored in slot @analysis
of the assay. The functions to extract or plot the data remain the same
like getSparkxGeneDf()
, getDeaResultsDf()
,
etc.
Previously, we stored meta variables like clustering and quality
control variables in the slot @fdata
, short for feature
data, based on the idea of “features that are not directly related to
molecular counts.” However, in other packages like Seurat or Squidpy,
the term feature often refers to numeric variables, including
gene expression, causing understandable confusion. To address this, we
renamed the slot. As shown in the documentation of the new
SPATA2
class, the slot is now called
@meta_obs
, short for metadata of the SPATA2 object’s
observations (barcoded spots, cells, etc.).
# previously (now deprecated)
getFeatureDf(object)
## Warning in confuns::give_feedback(msg = msg, fdb.fn = fdb_fn, with.time =
## FALSE): Function `getFeatureDf()` is deprecated and will be deleted in the near
## future. Please use `getMetaDf()` instead.
## # A tibble: 3,517 × 128
## barcodes sample nCount_Spatial nFeature_Spatial percent.mt percent.RB
## <chr> <chr> <dbl> <int> <dbl> <dbl>
## 1 AAACAAGTATCTCCC… T313 4319 2164 2.17 6.54
## 2 AAACAATCTACTAGC… T313 6005 2796 2.10 8.46
## 3 AAACACCAATAACTG… T313 349 254 4.18 7.74
## 4 AAACAGAGCGACTCC… T313 11394 4248 2.85 8.63
## 5 AAACAGCTTTCAGAA… T313 902 616 1.72 6.33
## 6 AAACAGGGTCTATAT… T313 175 144 5.49 6.33
## 7 AAACATGGTGAGAGG… T313 8327 2958 0.870 8.11
## 8 AAACCCGAACGAAAT… T313 2940 1595 2.21 7.21
## 9 AAACCGGGTAGGTAC… T313 96 83 1.57 10.2
## 10 AAACCGTTCGTCCAG… T313 671 394 1.99 8.18
## # ℹ 3,507 more rows
## # ℹ 122 more variables: Spatial_snn_res.0.8 <fct>, seurat_clusters <fct>,
## # AC_like <dbl>, AC_like_Prolif <dbl>, MES_like_hypoxia_independent <dbl>,
## # MES_like_hypoxia_MHC <dbl>, NPC_like_neural <dbl>, NPC_like_OPC <dbl>,
## # NPC_like_Prolif <dbl>, OPC_like <dbl>, OPC_like_Prolif <dbl>, cDC1 <dbl>,
## # cDC2 <dbl>, DC1 <dbl>, DC2 <dbl>, DC3 <dbl>, Mast <dbl>,
## # Mono_anti_infl <dbl>, Mono_hypoxia <dbl>, Mono_naive <dbl>, pDC <dbl>, …
# now
getMetaDf(object)
## # A tibble: 3,517 × 128
## barcodes sample nCount_Spatial nFeature_Spatial percent.mt percent.RB
## <chr> <chr> <dbl> <int> <dbl> <dbl>
## 1 AAACAAGTATCTCCC… T313 4319 2164 2.17 6.54
## 2 AAACAATCTACTAGC… T313 6005 2796 2.10 8.46
## 3 AAACACCAATAACTG… T313 349 254 4.18 7.74
## 4 AAACAGAGCGACTCC… T313 11394 4248 2.85 8.63
## 5 AAACAGCTTTCAGAA… T313 902 616 1.72 6.33
## 6 AAACAGGGTCTATAT… T313 175 144 5.49 6.33
## 7 AAACATGGTGAGAGG… T313 8327 2958 0.870 8.11
## 8 AAACCCGAACGAAAT… T313 2940 1595 2.21 7.21
## 9 AAACCGGGTAGGTAC… T313 96 83 1.57 10.2
## 10 AAACCGTTCGTCCAG… T313 671 394 1.99 8.18
## # ℹ 3,507 more rows
## # ℹ 122 more variables: Spatial_snn_res.0.8 <fct>, seurat_clusters <fct>,
## # AC_like <dbl>, AC_like_Prolif <dbl>, MES_like_hypoxia_independent <dbl>,
## # MES_like_hypoxia_MHC <dbl>, NPC_like_neural <dbl>, NPC_like_OPC <dbl>,
## # NPC_like_Prolif <dbl>, OPC_like <dbl>, OPC_like_Prolif <dbl>, cDC1 <dbl>,
## # cDC2 <dbl>, DC1 <dbl>, DC2 <dbl>, DC3 <dbl>, Mast <dbl>,
## # Mono_anti_infl <dbl>, Mono_hypoxia <dbl>, Mono_naive <dbl>, pDC <dbl>, …
SPATA2 focuses on spatial data and integrating multiple spatial
analysis approaches. Consequently, spatial data now has its own slot
@spatial
, previously used for random miscellaneous spatial
data, now a clearly structured S4 object called
SpatialData
. For more details on the class, call
?SpatialData
. This new architecture improves the management
of spatial aspects and alignment challenges. Here are the most important
aspects:
The SPATA2
object allows the storage of multiple images
by registering them individually using registerImage()
.
Each registered image creates a container object of class
HistoImage
, storing the image or its file directory, along
with additional information and data from image processing steps (e.g.,
tissue outline, alignment).
Working with multiple images, alongside the coordinates of data points and spatial reference features (SpatialAnnotation, SpatialTrajectory), requires careful alignment. Alignment involves matching image resolution and adjusting images for angle, horizontal or vertical translation, and stretching. This is crucial for images of neighboring tissue sections that are similar but not perfectly overlapping.
To facilitate alignment and integration of multiple images, coordinates, and spatial reference features, a reference image is designated. By default, this is the first image loaded into the SPATA2 object. SPATA2 assumes that coordinates and the reference image align perfectly in terms of vertical and horizontal justification (scaling may still be needed). Aligning additional images to the reference image ensures alignment with the data point coordinates. Additionally, the reference image allows automatic transfer of scale factors to newly registered images.
The active image is the default image used in functions requiring
image input, such as createImageAnnotations()
or
plotSpatialAnnotations()
, and for extracting coordinates.
During extraction, coordinates are scaled to the resolution of the
currently active image. The reference image can also serve as the active
image.
SPATA2
objects allow for the simultaneous storage of
multiple images, which may differ in resolution, requiring different
scale factors to align spatial data coordinates accurately with the
images. In version 3.0.0, coordinate variables are differentiated as
x_orig, y_orig (original coordinates) and x,
y (scaled coordinates).
The rationale: Whenever spatial data (coordinates of barcode spots,
coordinates of cells, tissue outlines, spatial annotations, spatial
trajectories, etc.) are stored, they are recorded as x_orig and
y_orig. Even if you are currently working within the resolution
of a different image the coordinates are scaled back to the “original
resolution” of your reference image. Upon extraction, a scaling process
is applied to any spatial data, such as in
getSpatAnnOutlineDf()
or getTissueOutlineDf()
.
This process ensures that coordinates are appropriately scaled to the
resolution of the active image. For instance, this scaling occurs in the
background when extracting or plotting barcoded spots of a Visium data
set in the context of a hypothetical image, lowres.
First, the original data.frame is obtained. `
# `as_is` = TRUE, skips any processing steps
coords_df <- getCoordsDf(object, as_is = TRUE)[1:10, ]
## Warning in check_object(object): SPATA2 object is of version 3.1.0. Current
## SPATA2 version is 3.1.1. Please use `updateSpataObject()`.
coords_df
## # A tibble: 10 × 7
## barcodes x_orig y_orig row col section section_dbscan
## <chr> <dbl> <dbl> <int> <int> <fct> <chr>
## 1 TAACCGTCCAGTTCAT-1 4002 2342 1 15 tissue_section_1 1
## 2 CGCGTGCTATCAACGA-1 5189 2340 1 27 tissue_section_1 1
## 3 TGGTGTGACAGACGAT-1 2716 2516 2 2 tissue_section_1 1
## 4 ATCTATCGATGATCAA-1 2815 2688 3 3 tissue_section_1 1
## 5 CGGTAACAAGATACAT-1 2914 2516 2 4 tissue_section_1 1
## 6 TCGCCGGAGAGTCTTA-1 3013 2688 3 5 tissue_section_1 1
## 7 GGAGGAGTGTGTTTAT-1 3112 2515 2 6 tissue_section_1 1
## 8 TTAGGTGTGACTGGTC-1 3211 2687 3 7 tissue_section_1 1
## 9 CAGGGCTAACGAAACC-1 3310 2515 2 8 tissue_section_1 1
## 10 CCCGTGGGTTAATTGA-1 3409 2687 3 9 tissue_section_1 1
Second, the image scale factor of image lwores is extracted.
# extract an image scale factor for the image named "lowres"
isf <- getScaleFactor(object, img_name = "lowres", fct_name = "image")
isf
## [1] 0.03460208
Third, the variables are transformed (scaled) to x and y.
# create variables x and y by applying the scale factor to the original coordinates
coords_df$x <- coords_df$x_orig * isf
coords_df$y <- coords_df$y_orig * isf
coords_df[,c("barcodes", "x_orig", "y_orig", "x", "y")]
## # A tibble: 10 × 5
## barcodes x_orig y_orig x y
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 TAACCGTCCAGTTCAT-1 4002 2342 138. 81.0
## 2 CGCGTGCTATCAACGA-1 5189 2340 180. 81.0
## 3 TGGTGTGACAGACGAT-1 2716 2516 94.0 87.1
## 4 ATCTATCGATGATCAA-1 2815 2688 97.4 93.0
## 5 CGGTAACAAGATACAT-1 2914 2516 101. 87.1
## 6 TCGCCGGAGAGTCTTA-1 3013 2688 104. 93.0
## 7 GGAGGAGTGTGTTTAT-1 3112 2515 108. 87.0
## 8 TTAGGTGTGACTGGTC-1 3211 2687 111. 93.0
## 9 CAGGGCTAACGAAACC-1 3310 2515 115. 87.0
## 10 CCCGTGGGTTAATTGA-1 3409 2687 118. 93.0
If the object does not contain an image, as with MERFISH or Xenium,
the @images
slot of the SpatialData
object is
empty. Without images, there is no need for alignment or image scale
factors. In this case, the x and y variables in the
data frame will be equal to x_orig and y_orig. This
might sound redundant or unnecessary. However, to accommodate potential
future options for aligning images of adjacent tissue slides to datasets
without original images, the *_orig coordinate structure is maintained,
even if no images are present during the initial object creation.