spata-v2-autoencoder.Rmd
Sequencing data from single-cell RNA sequencing (scRNA-seq) and from spatial transcriptomic experiments alike are prone to noise and technical artifacts that might obstruct downstram analysis. As recently proposed by Gökcen et al., 2019 autoencoder networks work against that. SPATA2 offers a similar approach to denoise data. Apart from data that makes more sense (see Figure 3.2 and 3.3) denoising your data often results in more insightful visualization.
library(SPATA2)
library(SPATAData)
spata_obj <- downloadSpataObject("275_T")
Note: Before you use the functions SPATA2 provides
please make sure that the R-packages keras
and
tensorflow
are installed on your device. Installing both
packages requires some steps in addition to what R does by default.
Therefore the installation of SPATA2 does not include
these two packages. Click here
for a guide on how to setup both packages.
Although there es only one function needed to denoise your data,
namely runAutoencoderDenoising()
, you might want to assess
how you set up the neural network that does the denoising - see chapter
2.2 for that matter.
Denoising your data in SPATA2 is as simple as it
gets. The function runAutoenconderDenoising()
constructs
the neural network according to your adjustments (see Chapter 2.2 on how
to optimize that) and stores the resulting denoised expression matrix
named denoised next to the already existing expression matrix
named scaled by using the function
addExpressionMatrix()
.
# all expression matrices before denoising
getExpressionMatrixNames(object = spata_obj)
# active expression matrix before denoising
getActiveMatrixName(object = spata_obj)
## [1] "scaled"
## [1] "scaled"
# denoising your data
spata_obj <-
runAutoencoderDenoising(
object = spata_obj,
activation = "selu",
bottleneck = 56,
epochs = 20,
layers = c(128, 64, 32),
dropout = 0.1
)
# all expression matrices after denoising
getExpressionMatrixNames(object = spata_obj)
# active expression matrix after denoising
getActiveMatrixName(object = spata_obj)
## [1] "scaled" "denoised"
## [1] "denoised"
Use printAutoencoderSummary()
if you want to recall the
way you set up the neural network that denoised your scaled data.
# print a summary
printAutoencoderSummary(object = spata_obj)
## The neural network that generated matrix 'denoised' was constructed with the following adjustments:
##
## Activation function: selu
## Bottleneck neurons: 56
## Dropout: 0.1
## Epochs: 20
## Layers: 128, 64 and 32
By default all functions in SPATA2 (like
plotSurface()
for example) use the expression matrix that
is set to the active expression matrix. Use
setActiveMatrix()
to switch between the expression matrices
that are currently stored inside your spata-object.
How neural networks are set up and how they work in detail is beyond
the scope of this tutorial. Among other things they differ in the number
of bottleneck neurons as well as in the activation
function they use. With runAutoencoderAssessment()
you
can iterate over all combinations you are interested in and assess the
results by the total variance of the principal component analysis
results.
spata_obj <-
runAutoencoderAssessment(
object = spata_obj,
activations = c("relu", "selu", "sigmoid"),
bottlenecks = c(32, 40, 48, 56, 64),
epochs = 20,
layers = c(128, 64, 32),
dropout = 0.1
)
plotAutoencoderAssessment(object = spata_obj)
## Additional set up of neural network:
##
## Epochs: 20
## Dropout: 0.1
## Layers: 128, 64 and 32
Sticking to the set up that resulted in the highest total variance is
a good start. If you want to extract the assessment results use
getAutoencoderAssessment()
. It returns a list of the total
variance results as well as the components the neural networks were
composed of.
You can store as many denoised expression matrices as you want in
your spata-object. Use the argument mtr_name_output
for
that matter. This makes runAutoencoderDenoising()
add the
denoised expression matrix under the name you specified instead of the
default name denoised. Keep in mind that the size of your
spata-object is mainly determined by the data matrices it contains and
that it can quickly grow to several gigabytes which might considerably
affect the speed with which your R session runs.
Figure 3.1 exemplifies how autoencoder denoising captures the main pattern and how this pattern is emphasized in the surface plot using the expression of gene MARCH6.
spata_obj <- setActiveMatrix(object = spata_obj, mtr_name = "scaled")
plotSurface(
object = spata_obj,
color_by = "MARCH6",
pt_size = 1.8
) +
labs(title = "Scaled (not denoised)") +
legendNone()
spata_obj <- setActiveMatrix(object = spata_obj, mtr_name = "denoised")
plotSurface(
object = spata_obj,
color_by = "MARCH6",
pt_size = 1.8
) +
labs(title = "Denoised")
Both, Figure 3.2 and Figure 3.3 go one step further. The example genes AIF1 and CD68 are both expressed in macrophages which, logically, should result in an almost perfect spatial overlap of both genes. Still, technical artifacts might interfere.
example_genes <- c("AIF1", "CD68")
spata_obj <- setActiveMatrix(object = spata_obj, mtr_name = "scaled")
plotSurface(
object = spata_obj,
color_by = example_genes[1],
smooth = FALSE,
pt_size = 1.8
) +
labs(title = "Scaled (not denoised)")
plotSurface(
object = spata_obj,
color_by = example_genes[2],
smooth = FALSE,
pt_size = 1.8
) +
labs(title = "Scaled (not denoised)")
In addition to that, the fact that both genes naturally differ in expression levels might affect the colors used to represent these expression levels which additionally obscure the exclusive coexpression of AIF1 and CD68 on the same cells. Denoising does not lead to perfectly matching expression but it does a sufficient job in capturing the underlying pattern.
spata_obj <- setActiveMatrix(object = spata_obj, mtr_name = "denoised")
plotSurface(
object = spata_obj,
color_by = example_genes[1],
smooth = FALSE,
pt_size = 1.8
) +
labs(title = "Denoised")
plotSurface(
object = spata_obj,
color_by = example_genes[2],
smooth = FALSE,
pt_size = 1.8
) +
labs(title = "Denoised")
The prominent changes of the coloring between surface plots from the
scaled expression matrix (Figure 3.2) and from the denoised one (Figure
3.3) results from the autoencoder network returning normally distributed
values while the majority of results from count normalization and
scaling are not normally distributed. To visualize differences between
the scaled and the denoised expression matrix you can either use
functions that visualize the distribution of numeric variables (such as
plotDensityplot()
, plotViolinplot()
etc.) in
combination with setActiveMatrix()
as shown in Figure 3.4
or you can use the function plotAutoencoderResults()
as
shown in Figure 3.5.
spata_obj <- setActiveMatrix(object = spata_obj, mtr_name = "scaled")
plotViolinplot(
object = spata_obj,
variables = example_genes,
display_facets = FALSE
) +
labs(title = "Scaled (not denoised)")
spata_obj <- setActiveMatrix(object = spata_obj, mtr_name = "denoised")
plotViolinplot(
object = spata_obj,
variables = example_genes,
display_facets = FALSE
) +
labs(title = "Denoised")
plotAutoencoderResults(
object = spata_obj,
genes = example_genes,
pt_alpha = 0.4,
pt_size = 1.25
)