Image Annotation Screening - Code & Concept • SPATA2

1. Prerequisites

Make sure to be familiar with the following vignettes:

2. Introduction & overview

Image Annotation Screening (IAS) was developed to find genes whose spatial expression stand in meaningful relation to histological areas. It makes use of SPATA2’s inbuilt image annotation system that allows to highlight and integrate the spatial extent of areas of interest in the analysis in form of image annotations. This vignette explains the concept of the IAS-algorithm step by step and provides code chunks needed to reproduce the results.

The algorithm can be divided in three parts.

1.) Binning barcode-spots depending on their distance to the annotated area.
2.) Inferring gene expression changes as a function of distance to the image annotation.
3.) Screening gene expression changes for biologically interesting behaviors by fitting the changes against predefined models.

Here, all steps of part 1.) and 2.) of the algorithm are shown. Part 3.) is explained in the tutorial on Model fitting in spatial transcriptomic studies.

As an example we are using a spatial transcriptomic sample of a glioblastoma that features a prominent necrotic area in it’s center. This area has been annotated with createImageAnnotations(). To reproduce the results of this vignette you can either create the image annotation yourself or add it with the code you find below.

library(SPATA2)
library(SPATAData)
library(tidyverse)

object_t313 <- downloadSpataObject(sample_name = "313_T")

# create the image annotation interactively
object_t313 <- createImageAnnotations(object = object_t313)

# or set the example annotation
data("image_annotations")

object_t313 <- 
  setImageAnnotation(
    object = object_t313,
    img_ann = image_annotations[["313_T"]][["necrotic_center"]], 
    overwrite = TRUE
  )

necrotic_area <- 
  ggpLayerImgAnnOutline(
    object = object_t313,
    ids = "necrotic_center",
    line_size = 1, 
    alpha = 0.1, 
    fill = "orange"
  )

# extract the polygon data.frame that contains
# information of the area of the image annotation
area_df <- 
  getImgAnnOutlineDf(
    object = object_t313,
    ids = "necrotic_center"
  )

rectangular <- 
  ggpLayerRect(
    object = object_t313,
    xrange = range(area_df[["x"]]),
    yrange = range(area_df[["y"]]), 
    expand = 0.05
  )

# plot results
plotImageGgplot(object = object_t313) + 
  rectangular + 
  necrotic_area

plotImageAnnotations(
  object = object_t313,
  ids = "necrotic_center",
  expand = 0.05, 
  line_size = 1,
  alpha = 0,
  display_caption = FALSE
)

3. Image annotation screening step by step

3.1 Binning barcode-spots depending on their distance to the image annotation.

The process of binning barcode-spots according to their distance to the image annotation can be splitted in two steps:

1. Obtain information about the borders of the image annotation.
1. Expand the borders of the image annotation repeatedly to consecutively bin the barcode-spots depending on the number of expansions necessary to reach them.

3.1.1 Obtain information about the borders of the image annotation

Mapping the x- and y-coordinates of the barcode-spots to the respective x- and y-aesthetics results in a basic surface plot.

# table on the right
coords_df <- 
  getCoordsDf(object = object_t313) %>% 
  select(-sample)

theme_add_on <- 
  theme(
    panel.grid.major = element_blank(), 
    panel.grid.minor = element_blank(),
    axis.title = element_blank()
  )

# plot on the left
surface_plot <- 
  ggplot(data = coords_df, mapping = aes(x = x, y = y)) + 
  geom_point(color = "lightgrey", alpha = 0.5, size = 2) + 
  coord_equal() +
  theme_bw() +
  theme_add_on

## # A tibble: 3,517 x 5
##    barcodes               x     y section outline
##    <chr>              <dbl> <dbl> <chr>   <lgl>  
##  1 AAACAAGTATCTCCCA-1 1456. 1241. 1       FALSE  
##  2 AAACAATCTACTAGCA-1  781.  309. 1       FALSE  
##  3 AAACACCAATAACTGC-1  509. 1421. 1       FALSE  
##  4 AAACAGAGCGACTCCT-1 1364.  527. 1       FALSE  
##  5 AAACAGCTTTCAGAAG-1  394. 1104. 1       FALSE  
##  6 AAACAGGGTCTATATT-1  440. 1183. 1       FALSE  
##  7 AAACATGGTGAGAGGA-1  292. 1481. 1       TRUE   
##  8 AAACCCGAACGAAATC-1 1604. 1142. 1       FALSE  
##  9 AAACCGGGTAGGTACC-1  611. 1084. 1       FALSE  
## 10 AAACCGTTCGTCCAGG-1  771. 1282. 1       FALSE  
## # i 3,507 more rows

The area of an image annotation is circumscribed by a polygon that is created during the drawing process in createImageAnnotations() where every few milliseconds the position of the cursor is stored in a data.frame. The corresponding data.frame can be extracted with getImgAnnOutlineDf().

# table on the right
area_df <- 
  getImgAnnOutlineDf(
    object = object_t313,
    ids = "necrotic_center"
    )

# save image annotation layer
necrotic_area <- 
  geom_polygon(
    data = area_df, 
    mapping = aes(x = x, y = y, group = ids), 
    size = 1, 
    fill = NA, 
    color = "black"
  )

# plot results on the left
surface_plot + 
  necrotic_area

## # A tibble: 286 x 4
##    ids             border     x     y
##    <fct>           <chr>  <dbl> <dbl>
##  1 necrotic_center outer   512. 1378.
##  2 necrotic_center outer   512. 1363.
##  3 necrotic_center outer   510. 1354.
##  4 necrotic_center outer   505. 1350.
##  5 necrotic_center outer   505. 1347.
##  6 necrotic_center outer   500. 1341.
##  7 necrotic_center outer   498. 1334.
##  8 necrotic_center outer   498. 1330.
##  9 necrotic_center outer   498. 1323.
## 10 necrotic_center outer   500. 1312.
## # i 276 more rows

Depending on the input for argument include_area, IAS included the area of the annotatio itself or only screens the surrounding of the image annotation. Here, we only screen the surrounding. Therefore, the barcode-spots that fall in the area of the image annotation can be removed.

# create variable that contains information about
# the position of all barcode-spots in relation to 
# the image annotation area
coords_df$pt_in_plg <-
  sp::point.in.polygon(
    point.x = coords_df$x,
    point.y = coords_df$y,
    pol.x = area_df$x,
    pol.y = area_df$y
  )

# remove barcode-spots that lie inside the polygon
filtered_coords_df <- 
  filter(coords_df, !pt_in_plg %in% c(1,2))

ias_surface_plot <- 
  ggplot(data = filtered_coords_df, mapping = aes(x = x, y = y)) + 
  geom_point(color = "lightgrey", alpha = 0.5, size = 2) + 
  coord_equal() +
  theme_bw() +
  theme_add_on

ias_surface_plot + 
  necrotic_area

3.1.2 Bin barcode-spots by consecutive expansion of the area

The package sf provides the functions st_polygon() and st_buffer() that can be used to expand the area. To sort the barcode-spots in the corresponding bins, the function sp::point.in.polygon() is used to label them.

# plot on the LEFT ----------

# add first row as the last row to "close" the area
# required by st_polygon()
closed_area_df <- area_df

closed_area_df[nrow(closed_area_df) + 1, ] <- closed_area_df[1,]

binwidth <- "0.2mm"

binwidth <- as_pixel(binwidth, object = object_t313)

print(binwidth)

## [1] 45.78391
## attr(,"unit")
## [1] "px"

# grow by a buffer 20 
area_df_grown20 <-
  sf::st_polygon(x = list(as.matrix(closed_area_df[,c("x", "y")]))) %>%
  sf::st_buffer(dist = binwidth) %>% # <- grows/buffers the polygon
  as.matrix() %>%
  as.data.frame() %>%
  magrittr::set_colnames(value = c("x", "y")) %>%
  mutate(ids = "necrotic_center") %>% 
  as_tibble()

# grow by a buffer of 40
area_df_grown40 <-
  sf::st_polygon(x = list(as.matrix(closed_area_df[,c("x", "y")]))) %>%
  sf::st_buffer(dist = (binwidth*2)) %>% # <- grows/buffers the polygon
  as.matrix() %>%
  as.data.frame() %>%
  magrittr::set_colnames(value = c("x", "y")) %>%
  mutate(ids = "necrotic_center") %>% 
  as_tibble()

necrotic_area_grown20 <- 
  geom_polygon(
    data = area_df_grown20, 
    mapping = aes(x = x, y = y, group = ids), 
    size = 1, 
    fill = NA, 
    color = "red"
  )

necrotic_area_grown40 <- 
  geom_polygon(
    data = area_df_grown40, 
    mapping = aes(x = x, y = y, group = ids), 
    size = 1, 
    fill = NA, 
    color = "blue"
  )

plot_area_expansion <- 
  ias_surface_plot + 
  necrotic_area + 
  necrotic_area_grown20 + 
  necrotic_area_grown40 + 
  labs(subtitle = "a) Area expansion")

# -----
  

# plot on the RIGHT ----------
area_list <-
  list(
    "Circle 1" = area_df_grown20, 
    "Circle 2" = area_df_grown40
  )

filtered_coords_df$bins_circle <- "Outside"

# iterate over list names (Circle 1, Circle 2)
for(expansion in names(area_list)){
  
  area_df_expanded <- area_list[[expansion]]
  
  filtered_coords_df$pt_in_plg <- 
    sp::point.in.polygon(
      point.x = filtered_coords_df$x, 
      point.y = filtered_coords_df$y, 
      pol.x = area_df_expanded$x,
      pol.y = area_df_expanded$y
    )
  
  filtered_coords_df <- 
    mutate(
      .data = filtered_coords_df, 
      bins_circle = case_when(
          # if bins_circle is NOT 'Outside' it has already bin binned by previous expansions
        # label with `expansion` (which is either 'Circle 1' or 'Circle 2')
        bins_circle == "Outside" & pt_in_plg %in% c(1,2) ~ {{expansion}},
        # else keep label 
        TRUE ~ bins_circle
      )
    )
  
}

plot_binned <- 
  ggplot(data = filtered_coords_df, mapping = aes(x = x, y = y)) + 
  geom_point(mapping = aes(color = bins_circle)) + 
  scale_color_manual(
    values = c(
      "Outside" = "lightgrey",
      "Circle 1" = "red", 
      "Circle 2" = "blue"
    )
  ) +
  coord_equal() +
  theme_bw() +
  theme_add_on + 
  labs(subtitle = "b) Binned") + 
  legendNone()

# -----
  
plot_area_expansion
plot_binned

How the binning is conducted and the screening is configurated depends on a few parameters. The two most important ones are:

1. distance: The distance from the border of the image annotation to the horizon in the periphery up to which the screening is conducted.
1. binwidth: The binwidth of every bin created as displayed above.

Both, distance and binwidth are distance measures and can be specified in pixel or in SI units. We recommend SI units as it makes the scripts reproducible in case of changing image resolutions. See our vignette on spatial measures in SPATA2 for more information about SI units are handled.

The function plotSurfaceIAS() allows to visualize the IAS setup. The binwidth of the example above has been set to 20 for visualization purposes. By default the IAS-algorithm uses a binwidth that is equal to the distance between the barcode-spots. This results in bins that include one layer of barcode-spots each.

# using the center to center distance of barcodes as a binwidth
bcsp_dist <- getCCD(object_t313)

# show results
bcsp_dist

## 100 [um]

plist <- 
  plotSurfaceIAS(
    object = object_t313,
    id = "necrotic_center",
    distance = "2mm", 
    pt_clrp = "milo", # use colorful palette to highlight the bins
    binwidth = bcsp_dist, 
    ggpLayer = ggpLayerThemeCoords(),
    show_plots = FALSE
  )

plist2 <- 
  plotSurfaceIAS(
    object = object_t313,
    id = "necrotic_center",
    distance = "2mm", 
    pt_clrp = "inferno", # use continuos palette to highlight the 'direction'
    binwidth = bcsp_dist, 
    ggpLayer = ggpLayerThemeCoords(),
    show_plots = FALSE
  )

# plot results 
plist$bins_circle + 
  labs(subtitle = "a) Colorful")

plist2$bins_circle + 
  labs(subtitle = "b) Continuous")

The number of bins created this way is expressed in the function: \(n.bins.circle = \frac{distance}{binwidth}\)

It is possible to specify n_bins_circle which makes it the third argument that can be used to adjust the screening.

1. n_bins_circle: The number of bins created within the range from the image annotations border till the horizon where the screening stops.

If so, either distance or binwidth must not be specified because the third parameter is always calculated according to the first and the second one. Accordingly, the relation between the arguments stays the same regardless of which argument is not specified.

If distance is not specified: \(distance = binwidth * n.bins.circle\)

If binwidth is not specified: \(binwidth = \frac{distance}{n.bins.circle}\)

# n_bins_circle = 20
# binwidth = 100um
# distance = n_bins_circle * binwidth = 20 * 100um = 2000um = 2mm

plist3 <- 
  plotSurfaceIAS(
    object = object_t313,
    id = "necrotic_center",
    n_bins_circle = 20,
    binwidth = bcsp_dist, 
    distance = NA_integer_, # NA as n_bins_circle and binwidth were specified
    ggpLayer = ggpLayerThemeCoords(),
    show_plots = FALSE
  )

# plot results
plist3$bins_circle

3.2 Inferring gene expression changes

The code to expand the area above is wrapped up inside the helper function bin_by_expansion(). The binning can be displayed in form of surface plots.

# distance of 3.3mm covers the whole extent of the sample 
distance <- "3.3mm"

# convert to pixel
distance <- as_pixel(distance, object = object_t313)
bcsp_dist <- as_pixel(bcsp_dist, object = object_t313)

distance

## [1] 755.4344
## attr(,"unit")
## [1] "px"

bcsp_dist

## [1] 22.89195
## attr(,"unit")
## [1] "px"

# round up with ceiling()
n_bins <- ceiling(distance/bcsp_dist)

# show results 
n_bins

## [1] 33
## attr(,"unit")
## [1] "px"

binned_coords <-
  bin_by_expansion(
    coords_df = coords_df, # the coordinates of the barcode spots
    area_df = area_df, # the polygon that encircles the area of interest
    binwidth = bcsp_dist,
    n_bins_circle = n_bins # n_bins = 30
  )

# show results
binned_coords

## # A tibble: 3,517 x 8
##    barcodes               x     y section outline bins_circle border bins_order
##    <chr>              <dbl> <dbl> <chr>   <lgl>   <fct>       <chr>       <dbl>
##  1 AAACAGGGTCTATATT-1  440. 1183. 1       FALSE   Core        outer           0
##  2 AAACCGGGTAGGTACC-1  611. 1084. 1       FALSE   Core        outer           0
##  3 AAACCGTTCGTCCAGG-1  771. 1282. 1       FALSE   Core        outer           0
##  4 AAACCTCATGAAGTTG-1  508.  985. 1       FALSE   Core        outer           0
##  5 AAACTTGCAAACGTAT-1  509. 1143. 1       FALSE   Core        outer           0
##  6 AAAGGCTACGGACCAT-1  908. 1480. 1       FALSE   Core        outer           0
##  7 AAAGGCTCTCGCGCCG-1  920. 1341. 1       FALSE   Core        outer           0
##  8 AAAGGGCAGCTTGAAT-1  588.  727. 1       FALSE   Core        outer           0
##  9 AAAGTAGCATTGCTCA-1  600. 1262. 1       FALSE   Core        outer           0
## 10 AAATCGTGTACCACAA-1  931. 1123. 1       FALSE   Core        outer           0
## # i 3,507 more rows

# a distance of 200 spans the whole sample
plot_binned <- 
  plotSurface(
    object = filter(binned_coords, bins_circle != "Core"),
    color_by = "bins_circle",
    pt_clrp = "inferno"
  )

# plot results
plot_binned

By summarizing the gene expression by these circular bins the gene expression changes in relation to the distance to the area of interest can be inferred and linearly displayed. To achieve that, the genes of interest are joined to the data.frame that contains the binning information.

# example genes
genes_of_interest <- c("FN1", "MYL6", "MCTS1")

joined_df <- 
  joinWithGenes(
    object = object_t313, 
    spata_df = binned_coords, 
    genes = genes_of_interest
  ) %>% 
  select(-x, -y) # not needed any longer

# show results 
joined_df

## # A tibble: 3,517 x 9
##    barcodes      section outline bins_circle border bins_order   FN1  MYL6 MCTS1
##    <chr>         <chr>   <lgl>   <fct>       <chr>       <dbl> <dbl> <dbl> <dbl>
##  1 AAACAGGGTCTA~ 1       FALSE   Core        outer           0 0.811 0         0
##  2 AAACCGGGTAGG~ 1       FALSE   Core        outer           0 0     0         0
##  3 AAACCGTTCGTC~ 1       FALSE   Core        outer           0 0.716 0         0
##  4 AAACCTCATGAA~ 1       FALSE   Core        outer           0 0.570 0         0
##  5 AAACTTGCAAAC~ 1       FALSE   Core        outer           0 0.447 0.694     0
##  6 AAAGGCTACGGA~ 1       FALSE   Core        outer           0 0.813 0.257     0
##  7 AAAGGCTCTCGC~ 1       FALSE   Core        outer           0 0.646 0.195     0
##  8 AAAGGGCAGCTT~ 1       FALSE   Core        outer           0 0.531 0.366     0
##  9 AAAGTAGCATTG~ 1       FALSE   Core        outer           0 0.441 0         0
## 10 AAATCGTGTACC~ 1       FALSE   Core        outer           0 0.648 0         0
## # i 3,507 more rows

Summarizing the genes by bin and shifting the perspective allows to plot the three genes along the variable bins_circle and to display the gene expression as a function of distance to the necrotic area.

smrd_df <- 
  # group barcode-spots by bin
  group_by(joined_df, bins_circle) %>% 
  # summarize gene expression
  summarize(
    dplyr::across(
      .cols = {{genes_of_interest}}, 
      .fns = ~ mean(.x)
    )
  ) %>% 
  ungroup() %>% 
  # rescale gene expression
  mutate(
    dplyr::across(
      .cols = {{genes_of_interest}}, 
      .fns = ~ normalize(.x) # ensure a range from 0 to 1
    ), 
    bins_numeric = (as.numeric(bins_circle)-1) # convert to numeric variable
  ) %>% 
  select(bins_numeric, everything(), -bins_circle)

# show results 
smrd_df

## # A tibble: 34 x 4
##    bins_numeric   FN1  MYL6 MCTS1
##           <dbl> <dbl> <dbl> <dbl>
##  1            0 0.610 0     0.304
##  2            1 0.961 0.151 0.283
##  3            2 1     0.241 0.342
##  4            3 0.970 0.222 0.763
##  5            4 0.944 0.308 0.507
##  6            5 0.982 0.280 0.563
##  7            6 0.920 0.324 0.523
##  8            7 0.871 0.327 0.840
##  9            8 0.855 0.371 0.745
## 10            9 0.843 0.396 0.643
## # i 24 more rows

Using scatter- or lineplots, the expression change of a gene can be plotted against the distance to the necrotic area.

# create plot
shifted_df <- 
  pivot_longer(
    data = smrd_df, 
    cols = {{genes_of_interest}}, 
    names_to = "genes", 
    values_to = "values"
  ) 

ggplot(shifted_df, mapping = aes(x = bins_numeric, y = values)) + 
  geom_point() + 
  geom_smooth(method = "lm") + 
  geom_vline(xintercept = 0.5, linetype = "dashed") + # indicate image annotation border
  facet_wrap(facets = . ~ genes, nrow = 1) + 
  theme_classic() + 
  theme(
    axis.line.x = element_line(
      arrow = arrow(length = unit(0.075, "inches"), type = "closed")
      ),
  ) +
  labs(x = "Circular Bins")

While genes FN1 and MYL6 follow a clear descending or ascending trend gene MCTS1 does not. The corresponding surface plots visualize that, too. FN1 is expressed highly in close proximity to the necrotic_center. The expression levels decrease with the distance to the area (model: linear descending). The opposite is true for gene MYL6 which increases with the distance to the necrotic area (model: linear ascending). Note, that the dashed line indicates the positioning of the image annotation border. The bin left to it corresponds to the mean expression inside the image annotation. Depending on the argument include_area this can be integrated in the visualization or not.

plotSurfaceComparison(
  object = object_t313,
  color_by = genes_of_interest, 
  display_image = FALSE, 
  pt_clrsp = "Greens 3",
  nrow = 1
) + 
  ggpLayerEncirclingIAS(
    object = object_t313, 
    id = "necrotic_center", 
    distance = "2.25mm", 
    binwidth = "225um", 
    line_size = 0.75,
    line_size_core = 1
  )

3.3. Screening gene expression changes

Click here to be guided to the vignette that explains how the inferred expression changes are fitted to models to find genes such as FN1 and MYL6.

Image Annotation Screening - Code & Concept

Jan Kueckelhaus

2022-08-20