
Analysis Guide Part I
Loading SpatialMap Objects from Enable ATLAS
AnalysisGuide1_Loading_sm_object_from_ATLAS.Rmd
Navigate vignettes
Data access and the emconnect
package
emconnect
is an R package developed by Enable Medicine that facilitates real-time communication between Workbench and ATLAS. Any SpatialMap
functions that involve pulling or pushing data to the Portal use emconnect
under the hood for that capability. It also has some useful convenience functions that we’ll utilize a bit in this vignette.
Note: while you used to need to wait for daily database exports to access your data,
emconnect
was recently migrated to a new real-time data access tool that eliminates this delay! You also no longer need to create a database connection to useemconnect
–connections to the database and credential checking are mediated by an API key you can generate on your profile page on the Portal This migration also means that any published data on the portal can be queried by any user on workbench!
Set study parameters
It’s good practice to set a few variables in your global environment before getting started - this just makes it more convenient to adapt the code to any future changes.
-
STUDY_NAME
is the title of your study on the Portal. -
BIOMARKER_EXPRESSION_VERSION
specifies which biomarker expression calculation you wish to analyze in this session. In most cases, you will want to analyze the most recent biomarker expression version. See the Appendix below for more details. -
DNA_CHANNEL
is the channel for the nuclear signal in your dataset. This channel is handled separately from the other biomarker channels inSpatialMap
. -
NEUTRAL_MARKERS
are any other channels in your dataset that aren’t typical biomarkers. This slot is flexible, but it often includes “Blank” or background channels. These will also be handled separately from other biomarkers.
STUDY_NAME <- "Immune cell topography predicts response to PD-1 blockade in cutaneous T cell lymphoma"
BIOMARKER_EXPRESSION_VERSION <- 1
DNA_CHANNEL <- "DAPI"
NEUTRAL_MARKERS <- c("Blank") # as a vector, since you may want to pass more than one value
Regions in this study
This next chunk uses emconnect
to pull the study metadata for regions in your study that pass visual QC. You can take a moment here to examine your samples, ensure everything looks right, and, if you wish, filter any samples out prior to pulling data into your session. Data can also always be filtered later.
region_table <- get_study_metadata(study_names = STUDY_NAME)
head(region_table)
#> study_id study_name
#> CITN10Co-88_c001_v001_r001_reg001 339 CITN 10 Codex
#> CITN10Co-88_c001_v001_r001_reg002 339 CITN 10 Codex
#> CITN10Co-88_c001_v001_r001_reg003 339 CITN 10 Codex
#> CITN10Co-88_c001_v001_r001_reg004 339 CITN 10 Codex
#> CITN10Co-88_c001_v001_r001_reg005 339 CITN 10 Codex
#> CITN10Co-88_c001_v001_r001_reg006 339 CITN 10 Codex
#> study_uuid
#> CITN10Co-88_c001_v001_r001_reg001 1cbf0bff-78a4-4062-8481-0e3e1d454949
#> CITN10Co-88_c001_v001_r001_reg002 1cbf0bff-78a4-4062-8481-0e3e1d454949
#> CITN10Co-88_c001_v001_r001_reg003 1cbf0bff-78a4-4062-8481-0e3e1d454949
#> CITN10Co-88_c001_v001_r001_reg004 1cbf0bff-78a4-4062-8481-0e3e1d454949
#> CITN10Co-88_c001_v001_r001_reg005 1cbf0bff-78a4-4062-8481-0e3e1d454949
#> CITN10Co-88_c001_v001_r001_reg006 1cbf0bff-78a4-4062-8481-0e3e1d454949
#> acquisition_id
#> CITN10Co-88_c001_v001_r001_reg001 CITN10Co-88_c001_v001_r001_reg001
#> CITN10Co-88_c001_v001_r001_reg002 CITN10Co-88_c001_v001_r001_reg002
#> CITN10Co-88_c001_v001_r001_reg003 CITN10Co-88_c001_v001_r001_reg003
#> CITN10Co-88_c001_v001_r001_reg004 CITN10Co-88_c001_v001_r001_reg004
#> CITN10Co-88_c001_v001_r001_reg005 CITN10Co-88_c001_v001_r001_reg005
#> CITN10Co-88_c001_v001_r001_reg006 CITN10Co-88_c001_v001_r001_reg006
#> visual_quality experiment_id
#> CITN10Co-88_c001_v001_r001_reg001 true 686
#> CITN10Co-88_c001_v001_r001_reg002 true 686
#> CITN10Co-88_c001_v001_r001_reg003 true 686
#> CITN10Co-88_c001_v001_r001_reg004 true 686
#> CITN10Co-88_c001_v001_r001_reg005 true 686
#> CITN10Co-88_c001_v001_r001_reg006 true 686
#> experiment_label
#> CITN10Co-88_c001_v001_r001_reg001 CITN 10 Codex TMA
#> CITN10Co-88_c001_v001_r001_reg002 CITN 10 Codex TMA
#> CITN10Co-88_c001_v001_r001_reg003 CITN 10 Codex TMA
#> CITN10Co-88_c001_v001_r001_reg004 CITN 10 Codex TMA
#> CITN10Co-88_c001_v001_r001_reg005 CITN 10 Codex TMA
#> CITN10Co-88_c001_v001_r001_reg006 CITN 10 Codex TMA
#> experiment_uuid
#> CITN10Co-88_c001_v001_r001_reg001 caeaf8a9-726b-4025-9d28-a8b3f8313b57
#> CITN10Co-88_c001_v001_r001_reg002 caeaf8a9-726b-4025-9d28-a8b3f8313b57
#> CITN10Co-88_c001_v001_r001_reg003 caeaf8a9-726b-4025-9d28-a8b3f8313b57
#> CITN10Co-88_c001_v001_r001_reg004 caeaf8a9-726b-4025-9d28-a8b3f8313b57
#> CITN10Co-88_c001_v001_r001_reg005 caeaf8a9-726b-4025-9d28-a8b3f8313b57
#> CITN10Co-88_c001_v001_r001_reg006 caeaf8a9-726b-4025-9d28-a8b3f8313b57
#> assay_metadata
#> CITN10Co-88_c001_v001_r001_reg001 {"image_mpp": 0.3775, "microscope": "keyence"}
#> CITN10Co-88_c001_v001_r001_reg002 {"image_mpp": 0.3775, "microscope": "keyence"}
#> CITN10Co-88_c001_v001_r001_reg003 {"image_mpp": 0.3775, "microscope": "keyence"}
#> CITN10Co-88_c001_v001_r001_reg004 {"image_mpp": 0.3775, "microscope": "keyence"}
#> CITN10Co-88_c001_v001_r001_reg005 {"image_mpp": 0.3775, "microscope": "keyence"}
#> CITN10Co-88_c001_v001_r001_reg006 {"image_mpp": 0.3775, "microscope": "keyence"}
#> assay_name assay_description color
#> CITN10Co-88_c001_v001_r001_reg001 CODEX grayscale
#> CITN10Co-88_c001_v001_r001_reg002 CODEX grayscale
#> CITN10Co-88_c001_v001_r001_reg003 CODEX grayscale
#> CITN10Co-88_c001_v001_r001_reg004 CODEX grayscale
#> CITN10Co-88_c001_v001_r001_reg005 CODEX grayscale
#> CITN10Co-88_c001_v001_r001_reg006 CODEX grayscale
#> assay_id axes sample_id region_id
#> CITN10Co-88_c001_v001_r001_reg001 1 TCYX 2306 3395
#> CITN10Co-88_c001_v001_r001_reg002 1 TCYX 2307 3396
#> CITN10Co-88_c001_v001_r001_reg003 1 TCYX 2308 3397
#> CITN10Co-88_c001_v001_r001_reg004 1 TCYX 2309 3398
#> CITN10Co-88_c001_v001_r001_reg005 1 TCYX 2310 3399
#> CITN10Co-88_c001_v001_r001_reg006 1 TCYX 2311 3400
#> region_display_label
#> CITN10Co-88_c001_v001_r001_reg001 07-002.Pre_1
#> CITN10Co-88_c001_v001_r001_reg002 07-002.EOT_1
#> CITN10Co-88_c001_v001_r001_reg003 07-004.Pre_1
#> CITN10Co-88_c001_v001_r001_reg004 07-004.EOT_1
#> CITN10Co-88_c001_v001_r001_reg005 41-001.EOT_2
#> CITN10Co-88_c001_v001_r001_reg006 41-005.Pre_1
#> region_uuid height
#> CITN10Co-88_c001_v001_r001_reg001 d54a64b8-dc55-41b5-bb33-aa6f606bc587 1440
#> CITN10Co-88_c001_v001_r001_reg002 4cda4b5f-7c6d-4125-a60a-13b00686f1c4 1440
#> CITN10Co-88_c001_v001_r001_reg003 29465edb-016f-447d-8482-69bb3ee7fa4c 1440
#> CITN10Co-88_c001_v001_r001_reg004 ec432465-88ce-43db-a61c-e14453855bc5 1440
#> CITN10Co-88_c001_v001_r001_reg005 2da86dca-74e3-4804-90c3-298b8d561c3e 1440
#> CITN10Co-88_c001_v001_r001_reg006 890836ff-b91a-4ce8-96d4-2f8f963b855a 1440
#> width n_levels n_channels n_cycles
#> CITN10Co-88_c001_v001_r001_reg001 1920 2 4 29
#> CITN10Co-88_c001_v001_r001_reg002 1920 2 4 29
#> CITN10Co-88_c001_v001_r001_reg003 1920 2 4 29
#> CITN10Co-88_c001_v001_r001_reg004 1920 2 4 29
#> CITN10Co-88_c001_v001_r001_reg005 1920 2 4 29
#> CITN10Co-88_c001_v001_r001_reg006 1920 2 4 29
#> time_point tissue_type tissue_subtype
#> CITN10Co-88_c001_v001_r001_reg001 Pre-treatment CTCL
#> CITN10Co-88_c001_v001_r001_reg002 Post-treatment CTCL
#> CITN10Co-88_c001_v001_r001_reg003 Pre-treatment CTCL
#> CITN10Co-88_c001_v001_r001_reg004 Post-treatment CTCL
#> CITN10Co-88_c001_v001_r001_reg005 Post-treatment CTCL
#> CITN10Co-88_c001_v001_r001_reg006 Pre-treatment CTCL
#> diagnosis treatment outcome species
#> CITN10Co-88_c001_v001_r001_reg001 Sezary Syndrome Pembro SD Human
#> CITN10Co-88_c001_v001_r001_reg002 Sezary Syndrome Pembro SD Human
#> CITN10Co-88_c001_v001_r001_reg003 Sezary Syndrome Pembro SD Human
#> CITN10Co-88_c001_v001_r001_reg004 Sezary Syndrome Pembro SD Human
#> CITN10Co-88_c001_v001_r001_reg005 Mycosis Fungoides Pembro SD Human
#> CITN10Co-88_c001_v001_r001_reg006 Sezary Syndrome Pembro PR Human
#> patient_id sample_label patient_label
#> CITN10Co-88_c001_v001_r001_reg001 817 07-002.Pre_1 110-07-002
#> CITN10Co-88_c001_v001_r001_reg002 817 07-002.EOT_1 110-07-002
#> CITN10Co-88_c001_v001_r001_reg003 818 07-004.Pre_1 110-07-004
#> CITN10Co-88_c001_v001_r001_reg004 818 07-004.EOT_1 110-07-004
#> CITN10Co-88_c001_v001_r001_reg005 819 41-001.EOT_2 110-41-001
#> CITN10Co-88_c001_v001_r001_reg006 822 41-005.Pre_1 110-41-005
region_table <- region_table %>%
filter(visual_quality == "true")
This function will pull basic information on your regions into a table (e.g. experiment labels as experiment_label
, sample labels as sample_label
, the label for each region displayed on the Portal as region_display_label
). If you have used the sample traits feature on the Portal to add metadata to the study, those data will also be loaded here. For example, primary tumor site versus metastasis, disease or diagnosis, or patient response to therapy are all common metadata that should be visible here.
colnames(region_table)
#> [1] "study_id" "study_name" "study_uuid"
#> [4] "acquisition_id" "visual_quality" "experiment_id"
#> [7] "experiment_label" "experiment_uuid" "assay_metadata"
#> [10] "assay_name" "assay_description" "color"
#> [13] "assay_id" "axes" "sample_id"
#> [16] "region_id" "region_display_label" "region_uuid"
#> [19] "height" "width" "n_levels"
#> [22] "n_channels" "n_cycles" "time_point"
#> [25] "tissue_type" "tissue_subtype" "diagnosis"
#> [28] "treatment" "outcome" "species"
#> [31] "patient_id" "sample_label" "patient_label"
If this query returns no data, or if some of your labels are missing, see the Tutorial vignette on troubleshooting.
As demonstrated above, you can filter on any combination of these traits using the filter
function from the package dplyr
. For more information on how to use dplyr
for analysis (highly recommended!) type ?dplyr::filter
into your console, or see the documentation for the dplyr
package.
# Example code (not real names)
region_table <- filter(region_table,
experiment_label == "specific experiment",
disease_type == "specific condition",
patient_outcome == "responder")
If you’re running through the analysis guides interactively, note that the version of the study "Immune cell topography predicts response to PD-1 blockade in cutaneous T cell lymphoma"
that is presented in the knitted versions of these vignettes in the SpatialMap documentation represents a 10 sample subset of the full published study on the portal. We can use dplyr::filter
here again to filter down to just those 10 regions.
subset_regions <- c("07-002.Pre_1",
"07-002.EOT_1",
"07-004.Pre_1",
"07-004.EOT_1",
"41-001.EOT_2",
"41-005.Pre_1",
"41-003.Pre_1",
"41-003.Resp_2",
"41-004.Pre_2",
"41-004.EOT_2")
region_table <- filter(region_table,
region_display_label %in% subset_regions)
Note that since these samples were segmented independently from the example analysis on the documentation site, and segmentation is a somewhat stochastic process, there may be some slight differences in analysis results. If you’d like to exactly replicate this analysis, the outputs saved by each of these notebooks is available to you in the same data asset that contains the vignette source code (at
/data/spatialmap_vignettes/spatialmap_analysis_guides/
after you’ve attached the data to your capsule). Starting from Analysis Guide 2, you can load this object in and reproduce the results in the vignettes. You can also change the STUDY_NAME to match any study you’d like to analyze, and use these Analysis Guides as a starting point for your own dataset!
Load the SM object
This will query ATLAS and load all the cell coordinates, annotations, and biomarker expression profiles from the specified regions.
sm <- spatialmap_from_db(acquisition_ids = region_table$acquisition_id,
study_names = STUDY_NAME,
expression.version = BIOMARKER_EXPRESSION_VERSION,
neutral.markers = c(DNA_CHANNEL, NEUTRAL_MARKERS))
#> Querying database!
#> Channel information...
#> Success!
Specifying
neutral.markers
is optional depending on your dataset, sincespatialmap_from_db
does have some default values for this parameter that work well for many CODEX experiments. See?spatialmap_from_db
for more information.
If this function gives an error, check out the Tutorial vignette on troubleshooting.
Once this process is complete, you can begin your analysis! Save this object in your working directory, and proceed to the next step in the analysis guide, QC filtering.
data_dir <- "."
# facil::check_dir is useful if you're running this yourself on code ocean
# you may want to set data_dir to /scratch or /results
write_path <- facil::check_dir(data_dir)$write_dir
saveRDS(sm, file.path(write_path, "sm_preQC.RDS"))
For an in-depth exploration of the structure of a
SpatialMap
object, take a look at our quickstart guide vignette!
Appendix
BIOMARKER_EXPRESSION_VERSION
The designations of each of the biomarker expression versions associated with your study are available on the Study Details page, which you can access by clicking on your study name in the Designer on the Portal. Once you are on the Study Details page, scroll down and select Analysis Version Metadata to see a list of biomarker expression versions.