Enable Medicine Logo
enablemedicine.com

Overview

We recently added functionality in SpatialMap to conduct “spatial QC”, which allows us to place additional QC filters on cells that are spatially far apart from their neighbors (“spatial outliers”). This is beneficial as there may be cells that are segmented but should be excluded from analysis, for example, because they are detached from the main tissue.

library(emconnect)
library(SpatialMap)
library(facil)
library(tidyverse)
library(magrittr)
library(ggplot2)
library(patchwork)
library(gghighlight)
# Establish em connection
con <- emconnect()

Example regions

To demonstrate the current spatial QC workflow, we will use two of the regions from a human skin dataset that could benefit from a toolset like this.

sm <- spatialmap_from_db(
  acquisition_ids = c("shu-66586",
                      "qvg-87880")
)
# save sm just in case!
saveRDS(sm, file.path(data_dir, "sm_spqc.RDS"))
# arcsinh normalization for qc filters
sm <- sm %>% 
  Normalize(method = "asinh", 
            from = "Data", 
            to = "NormalizedData")

Standard QC results

Here are QC results from our standard QC workflow (four QC parameters). The steps are not explained in detail as we have other resources on this topic.

new.settings <- c(
  min_signal.sum = 5,
  max_signal.sum = 100,
  min_DNA = 200,
  max_DNA = 15000,
  min_cell.size = 100,
  max_cell.size = 1600,
  min_signal.cv = 0.5,
  max_signal.cv = 2
)

settings(sm, names(new.settings)) <- new.settings
sm <- sm %>% 
  QCMetrics(slot = "NormalizedData") %>% 
  FilterCodex()
plotQCSummary(sm, group.variable = "region_display_label") %>% 
  wrap_plots()

Below are the spatial representations of the tissues, with each cell colored by its QC result.

plotFilteredCells(sm, plot.titles.from = "region_display_label") %>% 
  lapply(\(x) x + theme(legend.position = "none")) %>% 
  wrap_plots() + 
  theme(legend.position = "right") + 
  plot_layout(guides = "collect")

Below are the cells that would remain after removing those that failed QC.

sm %>% 
  FilterCodex(remove.filtered.cells = TRUE) %>% 
  plotFilteredCells(plot.titles.from = "region_display_label") %>% 
  lapply(\(x) x + theme(legend.position = "none")) %>% 
  wrap_plots() +
  plot_annotation(title = "cells remaining after standard QC")

Here we can see that there are cells remaining after standard QC that are spatially separated from the main tissue of interest. How can we remove these “spatial outliers” using “spatial QC” functionality?

Spatial QC workflow

Compute QC metrics

When QCMetrics is run and spatial.qc = TRUE, the function spatialQC is called under the hood. This allows us to now use the neighbor properties of cells, or graph partition methods to evaluate the “spatial quality” of cells. We can also perform QC filtering using these properties using FilterCodex by setting spatial.qc = TRUE.

Currently, we compute three spatial QC metrics for each cell:

  1. dist.n: Distance to the cell’s nth nearest neighbor
  2. dist.smooth: The average of dist.n across the cell’s k nearest neighbors
  3. partition: “Spatial cluster” assignment from running the Leiden algorithm (low-resolution) on the region’s spatial embedding

QCMetrics passes values for n, k, and partition_resolution to spatialQC. See ?spatialQC for more information.

sm <- sm %>% 
  QCMetrics(slot = "NormalizedData",
            spatial.qc = TRUE,
            n = 20, # looking at distance to 20th nearest neighbor
            k = 20, # smoothing on 20 nearest neighbors
            partition_resolution = 0.3 # low resolution!
  ) %>% 
  FilterCodex(spatial.qc = TRUE)

When we run plotQCSummary and plotFilterSummary, we now see that dist.n and dist.smooth appear as reasons for cells failing QC. Let’s focus more on making adjustments to these QC parameters.

plotQCSummary(sm, 
              group.variable = "region_display_label") %>% 
  wrap_plots()

plotFilterSummary(sm, 
                  group.variable = "region_display_label") %>% 
  wrap_plots(
    design = "#11#
              2233")

Adjust spatial QC parameters

Focus: dist.n

Looking at the QC summary above, we can see that most cells fall between the threshold lines for dist.n. While the histograms look decent, since this is a spatial metric, it might be more valuable to visually inspect how this filter performed by looking at the spatial representation of the tissues. In the plots below, we see the regions with each cell colored by the value of cellqc_dist.n.

plotRepresentation(sm, "spatial", 
                   "cellqc_dist.n", 
                   plot.titles.from = "region_display_label") %>% 
  lapply(\(x) x + theme(legend.position = "bottom")) %>% 
  wrap_plots()

This gives us a clearer idea of the tissue structure, but it is still difficult to tell what a reasonable cutoff would be. For example, we can see that there are several single cells away from the main tissue that should probably be filtered for both regions, but since the color scale differs for each region, we need a better way to determine an appropriate threshold.

We can preview how different thresholds might affect the QC filters using the custom function spatial_qc_plots (its definition is included at the end of this file). In the plots below, all cells with cellqc_dist.n above the threshold of 500 are colored red, and the color scale is shared across all regions, making it easier to interpret and guide decisions around spatial filtering. You can change the parameters for qc.var, thresh, and pal.type as desired.

spatial_qc_plots(sm,  
                 qc.var = "cellqc_dist.n", 
                 thresh = 500, 
                 pal.type = "c") %>%
  lapply(\(x) x + theme(legend.position = "none")) %>% 
  wrap_plots() +
  theme(legend.position = "right") +
  plot_layout(guides = "collect")

For both regions, it appears that a threshold of 500 would do a good job of filtering out most of the off-tissue single “spatial outliers” (red). We can set this as our new max threshold for dist.n and re-filter our cells.

Note: For the second region (pas-20-09454 A1 (reg 4)), we see that there is a larger clusters of cells away from the main tissue (to the left) that we might also want to filter out–we will do this with the next filtering approach.

# new settings
settings(sm, "max_dist.n") <- 500

# re-filter
sm <- sm %>% FilterCodex(spatial.qc = TRUE)

We can view the QC results once again here.

plotFilteredCells(sm, 
                  plot.titles.from = "region_display_label") %>%  
  lapply(\(x) x + theme(legend.position = "bottom"))

sm %>% 
  FilterCodex(spatial.qc = TRUE,
              remove.filtered.cells = TRUE) %>% 
  plotFilteredCells(plot.titles.from = "region_display_label") %>% 
  lapply(\(x) x + theme(legend.position = "none")) %>% 
  wrap_plots() +
  plot_annotation(title = "cells remaining after spatial QC with `dist.n`")

Focus: cellqc_partition

To improve spatial QC for the region pas-20-09454 A1 (reg 4), we might consider taking advantage of the computed partition, since the “spatial outliers” in these region are spatially clustered further from the main tissue.

Below is the spatial representation of the cells in this region, colored by partition label.

plotRepresentation(sm[[1]], "spatial", "cellqc_partition",
                   plot.titles.from = "region_display_label")

We see that four “clusters” to the left of the main tissue indeed cluster together–in cl10 and cl17.

plotRepresentation(sm[[1]], "spatial", "cellqc_partition",
                   plot.titles.from = "region_display_label") +
  gghighlight(grepl(paste(c("10", "17"), collapse = "|"), variable))

Using these partition labels directly as the exclude_partitions setting should allow us to filter these cells out. Because we only want to do this for one of the two regions, let’s separate that one out into its own SpatialMap object for this particular filtering step, then add it back to the original object.

# subset the SM object
sm_1 <- sm[1]

# new settings -- NOTE that they are separated by a SPACE
settings(sm_1, "exclude_partitions") <- "cl10_1 cl17_1"

# re-filter
sm_1 <- sm_1 %>% FilterCodex(spatial.qc = TRUE)

# recombine with first region to recreate original SM object
sm <- combineMaps(c(sm_1, sm[2]))

After updating the QC settings and refiltering, we now see that partition is now one of the most frequent modes of QC failure pas-20-09454 A1 (reg 4)!

plotFilterSummary(sm, 
                  group.variable = "region_display_label") %>% 
  wrap_plots(
    design = "#11#
              2233")

Remove filtered cells

Once we are satisfied with our QC parameters, we can preview the result of removing filtered cells by running the code below.

sm %>% 
  FilterCodex(recompute = FALSE,
              spatial.qc = TRUE,
              remove.filtered.cells = TRUE) %>% 
  plotFilteredCells(plot.titles.from = "region_display_label") %>% 
  lapply(\(x) x + theme(legend.position = "none")) %>% 
  wrap_plots() + 
  plot_annotation(title = "cells remaining after standard + spatial QC")

To actually remove the filtered cells, we run FilterCodex again, setting remove.filtered.cells = TRUE.

sm <- sm %>% 
  FilterCodex(recompute = FALSE,
              spatial.qc = TRUE,
              remove.filtered.cells = TRUE)

Now we can save our SpatialMap object, and proceed with analyzing only the cells that passed standard and spatial QC.

saveRDS(sm, file.path(data_dir, "sm_post-spatialQC.RDS"))

Appendix

spatial_qc_plots

spatial_qc_plots <- function(object, 
                             qc.var = "cellqc_dist.smooth", 
                             thresh = NULL, 
                             pal.type = "b", # either b(inned) or c(ontinuous)
                             n.breaks = 4 # only used if binned
) {
  .smapply(object, 
           function(reg) {
             rdl <- unique(cellMetadata(reg)[["region_display_label"]])
             
             my.df <- data.frame(embeddings(reg, "spatial"), 
                                 variable = cellMetadata(reg)[[qc.var]])
             
             if (is.null(thresh)) {
               p <- ggplot() + 
                 geom_point(data = my.df, 
                            aes(x, y, color = variable),
                            size = 0.5)
             } else {
               my.df_above <- filter(my.df, variable >= thresh)
               my.df_below <- filter(my.df, variable < thresh)
               
               p <- ggplot() + 
                 geom_point(data = my.df_below, 
                            aes(x, y, color = variable),
                            size = 0.5) +
                 geom_point(data = my.df_above,
                            aes(x, y), color = "red",
                            size = 0.5) +
                 labs(title = rdl,
                      subtitle = paste("red: above", thresh))
             }
             
             if (pal.type == "b") {
               p <- p + scale_color_viridis_b(n.breaks = n.breaks)
             } else if (pal.type == "c") {
               p <- p + scale_color_viridis_c()
             } else {
               cat("pal.type should either be 'b' or 'c'. using binned")
               p <- p + scale_color_viridis_c()
             }
             
             p <- p + labs(title = rdl,
                           color = qc.var) +
               ggthemes::theme_few() +
               theme(title = element_text(hjust = 0.5)) +
               coord_fixed()
             
             p
           }, 
           cat.output = "none")
  }