Overview
In this tutorial we will walk you through generating OAR scores from your single cell dataset separated by a factor in your data. We will begin with a Seurat object with Dictyostelium discoideum cells (dicty) that you can download from our github repository.
This approach is usually recommended when you have multiple cell types in your dataset or your experimental conditions introduce dramatic shifts in gene expression patterns. Our approach works best to detect trasncriptional shifts within a set of cells that you expect to be somewhat similar. Consequently, in situations like the ones outlined above, it makes more sense to split the data by a factor before running the analysis.
First, we will take a look at how the test performs when the data is not split. Later we will run the analysis split by factors. As always, we begin by loading the seurat object with the dataset we are interested in.
library(Seurat)
#> Loading required package: SeuratObject
#> Loading required package: sp
#> 'SeuratObject' was built under R 4.5.0 but the current version is
#> 4.5.1; it is recomended that you reinstall 'SeuratObject' as the ABI
#> for R may have changed
#>
#> Attaching package: 'SeuratObject'
#> The following objects are masked from 'package:base':
#>
#> intersect, t
sc.data <- readRDS(file = "dicty.rds")
sc.data
#> An object of class Seurat
#> 13206 features across 5000 samples within 1 assay
#> Active assay: RNA (13206 features, 2000 variable features)
#> 1 layer present: counts
#> 1 dimensional reduction calculated: umap
1. Running the full test
We can run the full test entire process with a single line. For a
description of all these parameters see our other vignettes. The result
is the Seurat object with the OARscore
,
KW.pvalue
, KW.BH.pvalue
and
pct.missing
values in the meta.data
slot.
sc.data <- oar(data = sc.data,
seurat_v5 = T, count.filter = 1,
blacklisted.genes = NULL, suffix = "",
store.hamming = F,
cores = 1)
#> Warning in oar(data = sc.data, seurat_v5 = T, count.filter = 1, blacklisted.genes = NULL, : Running process in fewer than 2 cores will considerably slow down progress
#> [1] "Extracting data..."
#> [1] "Extracting count tables"
#> [1] "Analysis started on:"
#> [1] "2025-08-21 21:34:19 UTC"
#> [1] "Identifying missing data patterns..."
...
To explore the results we can visualize OARscore
vs
pct.missing
values, coloring the cells based on some of the
attributes in our Seurat object meta.data.
library(patchwork)
library(ggplot2)
p1 <- scatter_score_missing(sc.data, pt.size = 0.5)+NoLegend()
p2 <- scatter_score_missing(sc.data, pt.size = 0.5, group.by = "group")
p1+p2

Scatter Plots by Factors
Here we can see that the underlying biological conditions are resulting in very different sets of OARscores. In particular, 10h of starvation, induces a completely separate profile of values. For that reason, running the test separating the data by this biological variable will allow us to prioritize cells within each condition for further analysis.
2. Analysis by Factor
OAR scores are typically more informative when distinguishing among cells of the same type. When working with a dataset with diverse cell types or dramatically affected by a biological variable, it can be helpful to split the data by that factor and run the test independently. This can be easily accomplished with our wrapper function, which takes the same parameters as we have discussed before and returns a Seurat object containing the results of the analysis.
sc.data <- oar_by_factor(sc.data, cores = 1, factor = "group", suffix = ".factor")
#> [1] "Splitting data by specified factor..."
#> Warning in FUN(X[[i]], ...): Running process in fewer than 2 cores will considerably slow down progress
#> [1] "Extracting data..."
#> [1] "Extracting count tables"
#> [1] "Analysis started on:"
#> [1] "2025-08-21 21:34:36 UTC"
#> [1] "Identifying missing data patterns..."
...
#> Warning in FUN(X[[i]], ...): Running process in fewer than 2 cores will considerably slow down progress
#> [1] "Extracting data..."
#> [1] "Extracting count tables"
#> [1] "Analysis started on:"
#> [1] "2025-08-21 21:34:45 UTC"
#> [1] "Identifying missing data patterns..."
...
#> Warning in FUN(X[[i]], ...): Running process in fewer than 2 cores will considerably slow down progress
#> [1] "Extracting data..."
#> [1] "Extracting count tables"
#> [1] "Analysis started on:"
#> [1] "2025-08-21 21:34:49 UTC"
#> [1] "Identifying missing data patterns..."
...
Now let us examine the outcome of this analysis:
p3 <- scatter_score_missing(sc.data, pt.size = 0.5, suffix = ".factor")+NoLegend()
p4 <- scatter_score_missing(sc.data, pt.size = 0.5, group.by = "group", suffix = ".factor")
p3+p4

Scatter Plots by Factors
As you can see, we are able to identify trasncriptional shifts within
each biological condition. Moreover, we can visualize these results in a
UMAP projection of the using Seurat::FeaturePlot()
.
p5 <- FeaturePlot(
sc.data, features = "OARscore.factor", order = T, pt.size = 0.5,
min.cutoff = "q40", max.cutoff = "q90")
#> Warning: The `slot` argument of `FetchData()` is deprecated as of SeuratObject 5.0.0.
#> ℹ Please use the `layer` argument instead.
#> ℹ The deprecated feature was likely used in the Seurat package.
#> Please report the issue at <https://github.com/satijalab/seurat/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
p5

Results by factor