Skip to contents

Overview

In this tutorial we will walk you through generating OAR scores from your single cell dataset separated by a factor in your data. We will begin with a Seurat object with Dictyostelium discoideum cells (dicty) that you can download from our github repository.

This approach is usually recommended when you have multiple cell types in your dataset or your experimental conditions introduce dramatic shifts in gene expression patterns. Our approach works best to detect trasncriptional shifts within a set of cells that you expect to be somewhat similar. Consequently, in situations like the ones outlined above, it makes more sense to split the data by a factor before running the analysis.

First, we will take a look at how the test performs when the data is not split. Later we will run the analysis split by factors. As always, we begin by loading the seurat object with the dataset we are interested in.

library(Seurat)
#> Loading required package: SeuratObject
#> Loading required package: sp
#> 'SeuratObject' was built under R 4.5.0 but the current version is
#> 4.5.1; it is recomended that you reinstall 'SeuratObject' as the ABI
#> for R may have changed
#> 
#> Attaching package: 'SeuratObject'
#> The following objects are masked from 'package:base':
#> 
#>     intersect, t
sc.data <- readRDS(file = "dicty.rds")
sc.data
#> An object of class Seurat 
#> 13206 features across 5000 samples within 1 assay 
#> Active assay: RNA (13206 features, 2000 variable features)
#>  1 layer present: counts
#>  1 dimensional reduction calculated: umap

1. Running the full test

We can run the full test entire process with a single line. For a description of all these parameters see our other vignettes. The result is the Seurat object with the OARscore, KW.pvalue, KW.BH.pvalue and pct.missing values in the meta.data slot.

sc.data <- oar(data = sc.data, 
               seurat_v5 = T, count.filter = 1,
               blacklisted.genes = NULL, suffix = "",
               store.hamming = F,
               cores = 1)
#> Warning in oar(data = sc.data, seurat_v5 = T, count.filter = 1, blacklisted.genes = NULL, : Running process in fewer than 2 cores will considerably slow down progress
#> [1] "Extracting data..."
#> [1] "Extracting count tables"
#> [1] "Analysis started on:"
#> [1] "2025-08-21 21:34:19 UTC"
#> [1] "Identifying missing data patterns..."
...

To explore the results we can visualize OARscore vs pct.missing values, coloring the cells based on some of the attributes in our Seurat object meta.data.

library(patchwork)
library(ggplot2)
p1 <- scatter_score_missing(sc.data, pt.size = 0.5)+NoLegend()
p2 <- scatter_score_missing(sc.data, pt.size = 0.5, group.by = "group")
p1+p2
Scatter Plots by Factors

Scatter Plots by Factors

Here we can see that the underlying biological conditions are resulting in very different sets of OARscores. In particular, 10h of starvation, induces a completely separate profile of values. For that reason, running the test separating the data by this biological variable will allow us to prioritize cells within each condition for further analysis.

2. Analysis by Factor

OAR scores are typically more informative when distinguishing among cells of the same type. When working with a dataset with diverse cell types or dramatically affected by a biological variable, it can be helpful to split the data by that factor and run the test independently. This can be easily accomplished with our wrapper function, which takes the same parameters as we have discussed before and returns a Seurat object containing the results of the analysis.

sc.data <- oar_by_factor(sc.data, cores = 1, factor = "group", suffix = ".factor")
#> [1] "Splitting data by specified factor..."
#> Warning in FUN(X[[i]], ...): Running process in fewer than 2 cores will considerably slow down progress
#> [1] "Extracting data..."
#> [1] "Extracting count tables"
#> [1] "Analysis started on:"
#> [1] "2025-08-21 21:34:36 UTC"
#> [1] "Identifying missing data patterns..."
...
#> Warning in FUN(X[[i]], ...): Running process in fewer than 2 cores will considerably slow down progress
#> [1] "Extracting data..."
#> [1] "Extracting count tables"
#> [1] "Analysis started on:"
#> [1] "2025-08-21 21:34:45 UTC"
#> [1] "Identifying missing data patterns..."
...
#> Warning in FUN(X[[i]], ...): Running process in fewer than 2 cores will considerably slow down progress
#> [1] "Extracting data..."
#> [1] "Extracting count tables"
#> [1] "Analysis started on:"
#> [1] "2025-08-21 21:34:49 UTC"
#> [1] "Identifying missing data patterns..."
...

Now let us examine the outcome of this analysis:

p3 <- scatter_score_missing(sc.data, pt.size = 0.5, suffix = ".factor")+NoLegend()
p4 <- scatter_score_missing(sc.data, pt.size = 0.5, group.by = "group", suffix = ".factor")
p3+p4
Scatter Plots by Factors

Scatter Plots by Factors

As you can see, we are able to identify trasncriptional shifts within each biological condition. Moreover, we can visualize these results in a UMAP projection of the using Seurat::FeaturePlot().

p5 <- FeaturePlot(
  sc.data, features = "OARscore.factor", order = T, pt.size = 0.5, 
  min.cutoff = "q40", max.cutoff = "q90")
#> Warning: The `slot` argument of `FetchData()` is deprecated as of SeuratObject 5.0.0.
#>  Please use the `layer` argument instead.
#>  The deprecated feature was likely used in the Seurat package.
#>   Please report the issue at <https://github.com/satijalab/seurat/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
p5
Results by factor

Results by factor