Analysis Guide Part I

Navigate vignettes

Data access and the `emconnect` package

emconnect is an R package developed by Enable Medicine that facilitates real-time communication between Workbench and ATLAS. Any SpatialMap functions that involve pulling or pushing data to the Portal use emconnect under the hood for that capability. It also has some useful convenience functions that we’ll utilize a bit in this vignette.

Note: while you used to need to wait for daily database exports to access your data, emconnect was recently migrated to a new real-time data access tool that eliminates this delay! You also no longer need to create a database connection to use emconnect–connections to the database and credential checking are mediated by an API key you can generate on your profile page on the Portal This migration also means that any published data on the portal can be queried by any user on workbench!

Set study parameters

It’s good practice to set a few variables in your global environment before getting started - this just makes it more convenient to adapt the code to any future changes.

STUDY_NAME is the title of your study on the Portal.
BIOMARKER_EXPRESSION_VERSION specifies which biomarker expression calculation you wish to analyze in this session. In most cases, you will want to analyze the most recent biomarker expression version. See the Appendix below for more details.
DNA_CHANNEL is the channel for the nuclear signal in your dataset. This channel is handled separately from the other biomarker channels in SpatialMap.
NEUTRAL_MARKERS are any other channels in your dataset that aren’t typical biomarkers. This slot is flexible, but it often includes “Blank” or background channels. These will also be handled separately from other biomarkers.

STUDY_NAME <- "Immune cell topography predicts response to PD-1 blockade in cutaneous T cell lymphoma"
BIOMARKER_EXPRESSION_VERSION <- 1
DNA_CHANNEL <- "DAPI"
NEUTRAL_MARKERS <- c("Blank") # as a vector, since you may want to pass more than one value

Regions in this study

This next chunk uses emconnect to pull the study metadata for regions in your study that pass visual QC. You can take a moment here to examine your samples, ensure everything looks right, and, if you wish, filter any samples out prior to pulling data into your session. Data can also always be filtered later.

region_table <- get_study_metadata(study_names = STUDY_NAME)
head(region_table)
#>                                   study_id    study_name
#> CITN10Co-88_c001_v001_r001_reg001      339 CITN 10 Codex
#> CITN10Co-88_c001_v001_r001_reg002      339 CITN 10 Codex
#> CITN10Co-88_c001_v001_r001_reg003      339 CITN 10 Codex
#> CITN10Co-88_c001_v001_r001_reg004      339 CITN 10 Codex
#> CITN10Co-88_c001_v001_r001_reg005      339 CITN 10 Codex
#> CITN10Co-88_c001_v001_r001_reg006      339 CITN 10 Codex
#>                                                             study_uuid
#> CITN10Co-88_c001_v001_r001_reg001 1cbf0bff-78a4-4062-8481-0e3e1d454949
#> CITN10Co-88_c001_v001_r001_reg002 1cbf0bff-78a4-4062-8481-0e3e1d454949
#> CITN10Co-88_c001_v001_r001_reg003 1cbf0bff-78a4-4062-8481-0e3e1d454949
#> CITN10Co-88_c001_v001_r001_reg004 1cbf0bff-78a4-4062-8481-0e3e1d454949
#> CITN10Co-88_c001_v001_r001_reg005 1cbf0bff-78a4-4062-8481-0e3e1d454949
#> CITN10Co-88_c001_v001_r001_reg006 1cbf0bff-78a4-4062-8481-0e3e1d454949
#>                                                      acquisition_id
#> CITN10Co-88_c001_v001_r001_reg001 CITN10Co-88_c001_v001_r001_reg001
#> CITN10Co-88_c001_v001_r001_reg002 CITN10Co-88_c001_v001_r001_reg002
#> CITN10Co-88_c001_v001_r001_reg003 CITN10Co-88_c001_v001_r001_reg003
#> CITN10Co-88_c001_v001_r001_reg004 CITN10Co-88_c001_v001_r001_reg004
#> CITN10Co-88_c001_v001_r001_reg005 CITN10Co-88_c001_v001_r001_reg005
#> CITN10Co-88_c001_v001_r001_reg006 CITN10Co-88_c001_v001_r001_reg006
#>                                   visual_quality experiment_id
#> CITN10Co-88_c001_v001_r001_reg001           true           686
#> CITN10Co-88_c001_v001_r001_reg002           true           686
#> CITN10Co-88_c001_v001_r001_reg003           true           686
#> CITN10Co-88_c001_v001_r001_reg004           true           686
#> CITN10Co-88_c001_v001_r001_reg005           true           686
#> CITN10Co-88_c001_v001_r001_reg006           true           686
#>                                    experiment_label
#> CITN10Co-88_c001_v001_r001_reg001 CITN 10 Codex TMA
#> CITN10Co-88_c001_v001_r001_reg002 CITN 10 Codex TMA
#> CITN10Co-88_c001_v001_r001_reg003 CITN 10 Codex TMA
#> CITN10Co-88_c001_v001_r001_reg004 CITN 10 Codex TMA
#> CITN10Co-88_c001_v001_r001_reg005 CITN 10 Codex TMA
#> CITN10Co-88_c001_v001_r001_reg006 CITN 10 Codex TMA
#>                                                        experiment_uuid
#> CITN10Co-88_c001_v001_r001_reg001 caeaf8a9-726b-4025-9d28-a8b3f8313b57
#> CITN10Co-88_c001_v001_r001_reg002 caeaf8a9-726b-4025-9d28-a8b3f8313b57
#> CITN10Co-88_c001_v001_r001_reg003 caeaf8a9-726b-4025-9d28-a8b3f8313b57
#> CITN10Co-88_c001_v001_r001_reg004 caeaf8a9-726b-4025-9d28-a8b3f8313b57
#> CITN10Co-88_c001_v001_r001_reg005 caeaf8a9-726b-4025-9d28-a8b3f8313b57
#> CITN10Co-88_c001_v001_r001_reg006 caeaf8a9-726b-4025-9d28-a8b3f8313b57
#>                                                                   assay_metadata
#> CITN10Co-88_c001_v001_r001_reg001 {"image_mpp": 0.3775, "microscope": "keyence"}
#> CITN10Co-88_c001_v001_r001_reg002 {"image_mpp": 0.3775, "microscope": "keyence"}
#> CITN10Co-88_c001_v001_r001_reg003 {"image_mpp": 0.3775, "microscope": "keyence"}
#> CITN10Co-88_c001_v001_r001_reg004 {"image_mpp": 0.3775, "microscope": "keyence"}
#> CITN10Co-88_c001_v001_r001_reg005 {"image_mpp": 0.3775, "microscope": "keyence"}
#> CITN10Co-88_c001_v001_r001_reg006 {"image_mpp": 0.3775, "microscope": "keyence"}
#>                                   assay_name assay_description     color
#> CITN10Co-88_c001_v001_r001_reg001      CODEX                   grayscale
#> CITN10Co-88_c001_v001_r001_reg002      CODEX                   grayscale
#> CITN10Co-88_c001_v001_r001_reg003      CODEX                   grayscale
#> CITN10Co-88_c001_v001_r001_reg004      CODEX                   grayscale
#> CITN10Co-88_c001_v001_r001_reg005      CODEX                   grayscale
#> CITN10Co-88_c001_v001_r001_reg006      CODEX                   grayscale
#>                                   assay_id axes sample_id region_id
#> CITN10Co-88_c001_v001_r001_reg001        1 TCYX      2306      3395
#> CITN10Co-88_c001_v001_r001_reg002        1 TCYX      2307      3396
#> CITN10Co-88_c001_v001_r001_reg003        1 TCYX      2308      3397
#> CITN10Co-88_c001_v001_r001_reg004        1 TCYX      2309      3398
#> CITN10Co-88_c001_v001_r001_reg005        1 TCYX      2310      3399
#> CITN10Co-88_c001_v001_r001_reg006        1 TCYX      2311      3400
#>                                   region_display_label
#> CITN10Co-88_c001_v001_r001_reg001         07-002.Pre_1
#> CITN10Co-88_c001_v001_r001_reg002         07-002.EOT_1
#> CITN10Co-88_c001_v001_r001_reg003         07-004.Pre_1
#> CITN10Co-88_c001_v001_r001_reg004         07-004.EOT_1
#> CITN10Co-88_c001_v001_r001_reg005         41-001.EOT_2
#> CITN10Co-88_c001_v001_r001_reg006         41-005.Pre_1
#>                                                            region_uuid height
#> CITN10Co-88_c001_v001_r001_reg001 d54a64b8-dc55-41b5-bb33-aa6f606bc587   1440
#> CITN10Co-88_c001_v001_r001_reg002 4cda4b5f-7c6d-4125-a60a-13b00686f1c4   1440
#> CITN10Co-88_c001_v001_r001_reg003 29465edb-016f-447d-8482-69bb3ee7fa4c   1440
#> CITN10Co-88_c001_v001_r001_reg004 ec432465-88ce-43db-a61c-e14453855bc5   1440
#> CITN10Co-88_c001_v001_r001_reg005 2da86dca-74e3-4804-90c3-298b8d561c3e   1440
#> CITN10Co-88_c001_v001_r001_reg006 890836ff-b91a-4ce8-96d4-2f8f963b855a   1440
#>                                   width n_levels n_channels n_cycles
#> CITN10Co-88_c001_v001_r001_reg001  1920        2          4       29
#> CITN10Co-88_c001_v001_r001_reg002  1920        2          4       29
#> CITN10Co-88_c001_v001_r001_reg003  1920        2          4       29
#> CITN10Co-88_c001_v001_r001_reg004  1920        2          4       29
#> CITN10Co-88_c001_v001_r001_reg005  1920        2          4       29
#> CITN10Co-88_c001_v001_r001_reg006  1920        2          4       29
#>                                       time_point tissue_type tissue_subtype
#> CITN10Co-88_c001_v001_r001_reg001  Pre-treatment        CTCL               
#> CITN10Co-88_c001_v001_r001_reg002 Post-treatment        CTCL               
#> CITN10Co-88_c001_v001_r001_reg003  Pre-treatment        CTCL               
#> CITN10Co-88_c001_v001_r001_reg004 Post-treatment        CTCL               
#> CITN10Co-88_c001_v001_r001_reg005 Post-treatment        CTCL               
#> CITN10Co-88_c001_v001_r001_reg006  Pre-treatment        CTCL               
#>                                           diagnosis treatment outcome species
#> CITN10Co-88_c001_v001_r001_reg001   Sezary Syndrome    Pembro      SD   Human
#> CITN10Co-88_c001_v001_r001_reg002   Sezary Syndrome    Pembro      SD   Human
#> CITN10Co-88_c001_v001_r001_reg003   Sezary Syndrome    Pembro      SD   Human
#> CITN10Co-88_c001_v001_r001_reg004   Sezary Syndrome    Pembro      SD   Human
#> CITN10Co-88_c001_v001_r001_reg005 Mycosis Fungoides    Pembro      SD   Human
#> CITN10Co-88_c001_v001_r001_reg006   Sezary Syndrome    Pembro      PR   Human
#>                                   patient_id sample_label patient_label
#> CITN10Co-88_c001_v001_r001_reg001        817 07-002.Pre_1    110-07-002
#> CITN10Co-88_c001_v001_r001_reg002        817 07-002.EOT_1    110-07-002
#> CITN10Co-88_c001_v001_r001_reg003        818 07-004.Pre_1    110-07-004
#> CITN10Co-88_c001_v001_r001_reg004        818 07-004.EOT_1    110-07-004
#> CITN10Co-88_c001_v001_r001_reg005        819 41-001.EOT_2    110-41-001
#> CITN10Co-88_c001_v001_r001_reg006        822 41-005.Pre_1    110-41-005

region_table <- region_table %>% 
  filter(visual_quality == "true")

This function will pull basic information on your regions into a table (e.g. experiment labels as experiment_label, sample labels as sample_label, the label for each region displayed on the Portal as region_display_label). If you have used the sample traits feature on the Portal to add metadata to the study, those data will also be loaded here. For example, primary tumor site versus metastasis, disease or diagnosis, or patient response to therapy are all common metadata that should be visible here.

colnames(region_table)
#>  [1] "study_id"             "study_name"           "study_uuid"          
#>  [4] "acquisition_id"       "visual_quality"       "experiment_id"       
#>  [7] "experiment_label"     "experiment_uuid"      "assay_metadata"      
#> [10] "assay_name"           "assay_description"    "color"               
#> [13] "assay_id"             "axes"                 "sample_id"           
#> [16] "region_id"            "region_display_label" "region_uuid"         
#> [19] "height"               "width"                "n_levels"            
#> [22] "n_channels"           "n_cycles"             "time_point"          
#> [25] "tissue_type"          "tissue_subtype"       "diagnosis"           
#> [28] "treatment"            "outcome"              "species"             
#> [31] "patient_id"           "sample_label"         "patient_label"

If this query returns no data, or if some of your labels are missing, see the Tutorial vignette on troubleshooting.

As demonstrated above, you can filter on any combination of these traits using the filter function from the package dplyr. For more information on how to use dplyr for analysis (highly recommended!) type ?dplyr::filter into your console, or see the documentation for the dplyr package.

# Example code (not real names)
region_table <- filter(region_table,
                       experiment_label == "specific experiment",
                       disease_type == "specific condition",
                       patient_outcome == "responder")

If you’re running through the analysis guides interactively, note that the version of the study "Immune cell topography predicts response to PD-1 blockade in cutaneous T cell lymphoma" that is presented in the knitted versions of these vignettes in the SpatialMap documentation represents a 10 sample subset of the full published study on the portal. We can use dplyr::filter here again to filter down to just those 10 regions.

subset_regions <- c("07-002.Pre_1",
                    "07-002.EOT_1",
                    "07-004.Pre_1",
                    "07-004.EOT_1",
                    "41-001.EOT_2",
                    "41-005.Pre_1",
                    "41-003.Pre_1",
                    "41-003.Resp_2",
                    "41-004.Pre_2",
                    "41-004.EOT_2")


region_table <- filter(region_table,
                       region_display_label %in% subset_regions)

Note that since these samples were segmented independently from the example analysis on the documentation site, and segmentation is a somewhat stochastic process, there may be some slight differences in analysis results. If you’d like to exactly replicate this analysis, the outputs saved by each of these notebooks is available to you in the same data asset that contains the vignette source code (at /data/spatialmap_vignettes/spatialmap_analysis_guides/ after you’ve attached the data to your capsule). Starting from Analysis Guide 2, you can load this object in and reproduce the results in the vignettes. You can also change the STUDY_NAME to match any study you’d like to analyze, and use these Analysis Guides as a starting point for your own dataset!

Load the SM object

This will query ATLAS and load all the cell coordinates, annotations, and biomarker expression profiles from the specified regions.

sm <- spatialmap_from_db(acquisition_ids = region_table$acquisition_id,
                         study_names = STUDY_NAME,
                         expression.version = BIOMARKER_EXPRESSION_VERSION,
                         neutral.markers = c(DNA_CHANNEL, NEUTRAL_MARKERS))
#> Querying database!
#> Channel information...
#> Success!

Specifying neutral.markers is optional depending on your dataset, since spatialmap_from_db does have some default values for this parameter that work well for many CODEX experiments. See ?spatialmap_from_db for more information.

If this function gives an error, check out the Tutorial vignette on troubleshooting.

Once this process is complete, you can begin your analysis! Save this object in your working directory, and proceed to the next step in the analysis guide, QC filtering.

data_dir <- "."
# facil::check_dir is useful if you're running this yourself on code ocean
# you may want to set data_dir to /scratch or /results
write_path <- facil::check_dir(data_dir)$write_dir

saveRDS(sm, file.path(write_path, "sm_preQC.RDS"))

For an in-depth exploration of the structure of a SpatialMap object, take a look at our quickstart guide vignette!

Appendix

`BIOMARKER_EXPRESSION_VERSION`

The designations of each of the biomarker expression versions associated with your study are available on the Study Details page, which you can access by clicking on your study name in the Designer on the Portal. Once you are on the Study Details page, scroll down and select Analysis Version Metadata to see a list of biomarker expression versions.

Loading SpatialMap Objects from Enable ATLAS

Data access and the emconnect package