Enable Medicine Logo
enablemedicine.com

Overview

This document is meant to serve as a template for creating a custom QC filter generated by drawing ROIs in the visualizer.

You can optionally specify a pre-existing QC filter generated via either the QC app or workbench. If this QC filter is one that was generated on workbench, it should be one that has already been pushed to the Portal and had time to sync.

Prerequisites

This notebook assumes that you have already created your ROI annotations, and followed these guidelines when doing so:

  1. Created a single visualizer story with a set of notes marking the ROIs you would like excluded from analysis. Cells inside these ROIs will be marked as failing QC and excluded from any analysis that includes this

  2. Named the notes in this visualizer story following a consistent pattern (e.g. “ROI_QC_{region_name}”). Input the pattern as the value for the ROI_annotation_pattern parameter in the header of this file.

  3. Export these ROIs as a annotation sets using the Generate Cell Annotations tool in the visualizer. Each note will create a new annotation set, with a name based on the title of the note. Cells outside the ROIs defined by each note will be labeled “Other”.

Code setup

knitr::opts_chunk$set(echo = TRUE)

library(emconnect)
library(SpatialMap)
library(tidyverse)
library(stringr)
# Run this chunk before knitting!
con <- emconnect()

# If pushing to portal, initialize Portal connection
if (params$push_to_portal) {
  pt <- initialize_Portal_connection(quiet = TRUE)
}
# These values are defined in and passed from the header
STUDY_ID <- params$study_id
QC_ANNO_ID <- params$qc_annotation_id
ROI_ANNO_PATTERN <- params$ROI_annotation_pattern
BIOMARKER_EXPRESSION_VERSION <- params$biomarker_expression_version
NEW_ANNO_NAME <- params$new_anno_name
NEW_ANNO_DESCRIPTION <- params$new_anno_description


data_dir <- normalizePath(params$data_dir)

# stop if data_dir doesn't exist
if (!dir.exists(data_dir)) {
  msg <- str_glue("The directory you provided does not exist!:\n\t {data_dir}")
  stop(msg)
}

Pulling data

Let’s start by figuring out which annotation sets will be needed for this workflow. We’ll look for any annotation sets that match the note title pattern specified as ROI_ANNO_PATTERN. This should only include notes from this visualizer story.

all_annos <- list_cell_annotations(study_ids = STUDY_ID) %>%
  bind_rows() %>%
  select(annotation_id, annotation_name) %>%
  distinct()

ROI_QC_ann_ids <- all_annos %>%
  filter(grepl(x = annotation_name,
               pattern = ROI_ANNO_PATTERN,
               fixed = TRUE)) %>%
  pull(annotation_id)

annos_to_pull <- ROI_QC_ann_ids

if (length(annos_to_pull) == 0) {
  stop("No ROI annotations found matching pattern!")
}

# If a QC annotation is specified, pull that one too
if (!is.null(QC_ANNO_ID)) {
  annos_to_pull <- c(annos_to_pull, QC_ANNO_ID)
}

We’re going to pull data for every region in the study. We’ll also save the result so our progress is saved in the event something should happen to the session.

sm <- spatialmap_from_db(study_ids = STUDY_ID,
                         expand_metadata = F,
                         expression.version = BIOMARKER_EXPRESSION_VERSION,
                         annotation_ids = annos_to_pull)

saveRDS(sm, file.path(data_dir, "sm_raw.RDS"))

Generate custom QC filter

Next, we’re going to search the annotation sets pulled in the previous step for annotations that match the ROI pattern. Check the results to make sure they’re correct.

all_anno_names <- colnames(cellMetadata(sm))

ROI_anno_names <- all_anno_names[{
  grepl(pattern = ROI_ANNO_PATTERN,
        all_anno_names,
        fixed = TRUE)
}]

ROI_anno_names
##  [1] "anno3069_ROI_QC_1"  "anno3073_ROI_QC_2"  "anno3055_ROI_QC_3" 
##  [4] "anno3057_ROI_QC_4"  "anno3059_ROI_QC_5"  "anno3061_ROI_QC_6" 
##  [7] "anno3063_ROI_QC_7"  "anno3065_ROI_QC_8"  "anno3067_ROI_QC_9" 
## [10] "anno3071_ROI_QC_10"

Now, combine all of the ROI QC annotation sets into a single annotation set. If another QC annotation was provided, combine the ROI annotation with that annotation set as well.

ROI_fail_string <- "ROI_excluded"

cm <- cellMetadata(sm) %>%
  # Join all the ROI annotations into a string so they can all be checked together
  # Helps with compute efficiency
  unite("unified_ROI_QC",
        all_of(ROI_anno_names)) %>%
  # If any of the ROI annotations included a cell, mark it as failing
  mutate(QC_excluded = if_else(grepl(ROI_ANNO_PATTERN, unified_ROI_QC, fixed = TRUE),
                               ROI_fail_string, "pass"))

# If you specified a set of QC filters to combine with the ROI annotations, do that here
if (!is.null(QC_ANNO_ID)) {
  QC_anno_name_pattern <- paste0("anno", QC_ANNO_ID)
  QC_anno_name <- all_anno_names[{
    grepl(QC_anno_name_pattern,
          all_anno_names,
          fixed = TRUE)
  }]
  
  # Logic: 
  # 1. If both pass, mark cell as passing
  # 2. If only ROI fail, mark as ROI fail
  # 3. If only QC filter fail, copy that QC result
  # 4. If both fail, append ROI fail to QC result
  
  cm <- mutate(cm,
               QC_excluded = case_when(.data[[QC_anno_name]] == "pass" &
                                         QC_excluded == "pass" ~
                                         "pass",
                                       .data[[QC_anno_name]] == "pass" &
                                         QC_excluded == ROI_fail_string ~
                                         ROI_fail_string,
                                       .data[[QC_anno_name]] != "pass" &
                                         QC_excluded == "pass" ~
                                         .data[[QC_anno_name]],
                                       .data[[QC_anno_name]] != "pass" &
                                         QC_excluded == ROI_fail_string ~
                                         sprintf("%s, %s",
                                                 .data[[QC_anno_name]],
                                                 ROI_fail_string),
                                       TRUE ~ "MISSING CASE"))
  
  if (any(cm$QC_excluded == "MISSING CASE")) {
    stop("Error in annotation combination logic")
  }
}

Push custom QC back to the Portal

First, add the updated metadata back to the sm object.

# Pull out new values, keeping the association with cell.ids
custom_QC <- cm$QC_excluded
names(custom_QC) <- rownames(cm)
# add back
sm <- addCellMetadata(sm, custom_QC, col.names = "custom_QC")

Now, push this combined annotation back to the Portal as a QC filter.

uploadQC(sm,
         metadata_names = "custom_QC",
         uploaded_name = NEW_ANNO_NAME,
         annotation_description = NEW_ANNO_DESCRIPTION)

Also can be useful to save the final version of the object for further work on Workbench. This annotation set should be available when pulling with spatialmap_from_db after the next Snowflake update.

saveRDS(sm, file.path(data_dir, "sm_with_ROI_QC_anno.RDS"))

Session info

Click to expand

sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] forcats_0.5.2     stringr_1.4.0     dplyr_1.0.10      purrr_1.0.0      
##  [5] readr_2.1.2       tidyr_1.2.1       tibble_3.1.8      ggplot2_3.3.6    
##  [9] tidyverse_1.3.2   SpatialMap_0.5.45 emconnect_1.9.0  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.11         svglite_2.1.0       lubridate_1.9.0    
##  [4] prettyunits_1.1.1   assertthat_0.2.1    digest_0.6.31      
##  [7] utf8_1.2.2          R6_2.5.1            cellranger_1.1.0   
## [10] odbc_1.3.3          backports_1.4.1     reprex_2.0.2       
## [13] evaluate_0.19       httr_1.4.4          pillar_1.8.1       
## [16] progress_1.2.2      rlang_1.1.1         googlesheets4_1.0.1
## [19] readxl_1.4.1        blob_1.2.4          jquerylib_0.1.4    
## [22] rmarkdown_2.14      googledrive_2.0.0   bit_4.0.5          
## [25] munsell_0.5.0       broom_1.0.1         compiler_4.1.2     
## [28] modelr_0.1.10       xfun_0.36           pkgconfig_2.0.3    
## [31] systemfonts_1.0.4   htmltools_0.5.4     tidyselect_1.2.0   
## [34] fansi_1.0.3         crayon_1.5.2        withr_2.5.0        
## [37] tzdb_0.3.0          dbplyr_2.2.1        grid_4.1.2         
## [40] jsonlite_1.8.0      gtable_0.3.1        lifecycle_1.0.3    
## [43] DBI_1.1.3           magrittr_2.0.3      scales_1.2.1       
## [46] pbapply_1.6-0       cli_3.4.1           stringi_1.7.8      
## [49] cachem_1.0.6        fs_1.5.2            ggthemes_4.2.4     
## [52] xml2_1.3.3          bslib_0.4.2         ellipsis_0.3.2     
## [55] tripack_1.3-9.1     generics_0.1.3      vctrs_0.6.3        
## [58] tools_4.1.2         bit64_4.0.5         facil_0.2.3        
## [61] glue_1.6.2          hms_1.1.3           parallel_4.1.2     
## [64] fastmap_1.1.0       yaml_2.3.6          timechange_0.1.1   
## [67] colorspace_2.0-3    gargle_1.2.1        rvest_1.0.3        
## [70] knitr_1.41          haven_2.5.1         sass_0.4.4