10X Visium Spatial Transcriptomics Analysis Report

Contract Information	Contract Content
Project Number	NovogeneST_MouseBrain
Project Name	Spatial Transcriptomics Analysis Report (Mouse Brain Tissue)
Report Date	2020-07-23
Report Number	NovogeneST_MouseBrain

Your Company

1 Experimental Workflow

The 10x Genomics Visium^TM spatial transcriptomics is an unbiased detection of gene expression based on intact tissue sections. This technology retains the histological information of the sample by performing H&E staining and imaging on frozen sections placed in specific capture areas on the slide. Subsequently, through permeabilization of the same sample, the RNA is released and hybridizes with the oligonucleotide probes in the capture area, allowing different spots to be labeled with spatial barcodes. The 10X Visium spatial transcriptomics integrates gene expression activity with histological information, presenting a new view of the complexity of tissues and gene expression. The core of the 10X Visium spatial gene expression technology lies in the spatial barcode slide, and the technical principle is specifically demonstrated as follows:

Figure 1.1 Principle of Visium Spatial Gene Expression Technology (In a 6.5 mm X 6.5 mm area, there are nearly 5,000 spots, each containing millions of spatial barcode capture probes. The DNA probes are 5' fixed on the slide, including Illumina sequencing primer sequences, spatial barcode, UMI, and poly(dT). The spatial barcode sequence of each spot is unique, with a diameter of 55 micrometers, capable of capturing RNA from 1 to 10 cells.)

Every step of the 10X Visium spatial transcriptomics experimental workflow, from fresh tissue processing, sample preparation, sample quality control, tissue optimization, library construction, to sequencing, directly affects the quantity and quality of the sequencing data, and thus influences the results of subsequent information analysis. To ensure the accuracy and reliability of the sequencing data from the source, we promise to strictly control all data production processes, ensuring the generation of high-quality data from the very beginning. The overall experimental workflow of the 10x Genomics Visium™ spatial transcriptomics is shown as follows:

Your Company

1.1 Sample Shipment Requirements

Different sample types require different cryopreservation methods. Customers should perform cryopreservation and embedding based on the characteristics of their samples, and can refer to the 10x official sample preparation recommendations for details. Customers are required to provide H&E histological information and RNA integrity test results of the frozen samples.

If customers are unable to provide H&E histological information and RNA integrity test results of the frozen samples, and the quality check fails after the samples are received by the company, the customer will need to bear certain costs. This will also result in a loss of time. Therefore, we recommend that customers perform quality checks on their samples if conditions permit.

1.2 Sample Quality Control

For mailed OCT-embedded tissue blocks, during sectioning, frozen sections are stained with H&E, photographed, and multiple frozen sections are collected for RNA extraction. The following two prerequisites must be met before proceeding with subsequent experimental operations: the histological information of the sample tissue must be intact, and the RNA must be non-degraded (RIN >= 7).

1.3 Tissue Permeabilization

The purpose of tissue permeabilization is to determine the optimal permeabilization time for the sample type of interest, ensuring the release of RNA from the tissue samples for hybridization with the oligonucleotide probes on the slide to perform subsequent reverse transcription to synthesize cDNA. Since different types of samples have varying permeabilization conditions, it is recommended to use the 10X official tissue optimization kit (Visium Spatial Tissue Optimization Kit) to test the permeabilization time of the samples before conducting the formal experiment to determine the optimal permeabilization time. The tissue optimization experimental workflow is shown as follows:

1.4 Library Construction

The library construction process includes the preparation of frozen sections, H&E staining and imaging, permeabilization treatment, cDNA synthesis, collection of cDNA into tubes for PCR amplification, and a series of experimental procedures for library construction. After the library passes the quality check, it is ready for sequencing. The library construction experimental workflow is shown as follows:

1.5 Sequencing

The 10X spatial transcriptomics library adopts the Illumina Hiseq PE150 or NovaSeq PE150 sequencing strategy. It is recommended to sequence 50,000-100,000 reads per spot, which means sequencing 75-100G of data per sample (5000 spots). The sequencing depth is related to the coverage area of the sample capture region on the slide, as well as the sample type and complexity. Increasing the sequencing data volume appropriately can improve the detection rate of genes.

The information of the Visium chip used for library construction is as follows:

Sample Name	Chip Number (slide)	Capture Area (area)
posterior1	V19L29-035	A1
posterior2	V19L29-035	C1

Your Company

2 Bioinformatics Analysis Workflow

After obtaining the raw sequencing reads, the bioinformatics analysis is conducted through the following workflow. The 10X official software Space Ranger is used to integrate gene expression information with the spatial location information of tissue sections, intuitively displaying the spatial distribution of cell clusters and subclusters, as well as the spatial distribution of differentially expressed genes on the tissue sections. Subsequently, functional enrichment analysis is performed on these differentially expressed genes to identify the functional characteristics of the subclusters. In addition to the analysis results from Space Ranger, we also provide basic analysis results using the third-party software Seurat. Furthermore, we can offer customized analysis content such as cell definition, variant detection, copy number analysis, and ligand-receptor analysis according to customer requirements. (*: Customized analysis content is not included in the basic analysis report. Customers who require customized analysis should make separate requests.)

Your Company

3 Data Alignment and Expression Quantification

3.1 Data Alignment (Space Ranger)

Space Ranger is the official software package provided by 10x Genomics for Visium spatial transcriptomics data analysis. Space Ranger takes the raw sequencing data in fastq format and the H&E-stained tissue section images as input data, and performs reference genome alignment, tissue detection, metrics calculation, and barcode/UMI statistics, generating a spot-gene expression matrix.

3.1.1 Reference Genome Alignment

The alignment software used by Space Ranger is STAR. After aligning reads to the reference genome, annotations are made using a GTF file, and reads are categorized into exon, intron, and intergenic regions. The specific rules for categorization are as follows: reads with at least 50% alignment to exons are classified as exon reads; reads that align to non-exon regions but intersect with introns are classified as intron reads; all others are classified as intergenic.

The basic data analysis using Space Ranger mainly includes the following results: statistics on the number of spots covered by the tissue section, the number of detected genes, sequencing data output and quality statistics, and reference genome alignment status. The statistical results for each sample in this project are as follows:

Sample Name	Data Statistics
posterior1	web_summary.html
posterior2	web_summary.html

3.1.2 Image Processing

Space Ranger detects spots covered by the tissue section, and only barcodes corresponding to spots covered by the tissue section are used for downstream analysis. Space Ranger relies on image processing algorithms to address two key issues related to the tissue section image: determining the placement of the tissue and aligning the fiducial pattern. Tissue detection is used to identify which capture points, and thus which barcodes, will be used for analysis. Fiducial alignment allows Space Ranger to know the location of each barcode-spot in the image, as the setup for imaging the Visium capture area may vary slightly for each user.

Space Ranger first extracts features that appear to be fiducials from the tissue section image, and then attempts to match these potential fiducials to the known fiducial pattern. The fiducials extracted from the image may have missing information, such as areas covered by the tissue, and may also have false positives, such as tissue debris or residue being identified as fiducials. The output of fiducial alignment is a coordinate transformation that associates the Visium barcode spot pattern with the user's tissue image.

Each area on the Visium slide contains 4,992 capture points. In practice, only a portion of these points are covered by the tissue. To restrict Space Ranger's analysis to only those points where the tissue is placed, Space Ranger uses an algorithm to identify the tissue in the bright-field image. Using the grayscale values of the input image, multiple samplings can be performed to calculate and compare multiple estimates of where the tissue section is placed. These estimates are used to train a statistical classifier to label each pixel within the capture area as either tissue or background.

posterior1_HE_Staining posterior1_tissue_Spatial posterior2_HE_Staining posterior2_tissue_Spatial

Figure 3.1 Image Alignment and Tissue Sample Detection

HE_Staining: H&E-stained tissue section image, tissue_Spatial: Spatial distribution map of tissue detection

Your Company

3.2 Expression Quantification

We used the SpaceRanger software to count the UMI of each gene in every spot to obtain the spot-gene expression matrix of the sample. Basic quality control was performed on the data. The figure below shows the distribution of UMI/Gene counts across the tissue space.

posterior1_UMI_Spatial posterior1_Gene_Spatial posterior2_UMI_Spatial posterior2_Gene_Spatial

Figure 3.2 Gene Expression Quantification of Spatial Spots

UMI_Spatial: Spatial distribution map of UMI counts, Gene_Spatial: Spatial distribution map of detected gene counts

Your Company

4 Space Ranger Analysis

The Space Ranger software can directly perform downstream analysis on the gene expression matrix, which mainly includes PCA dimensionality reduction, clustering analysis, t-SNE dimensionality reduction, and UMAP dimensionality reduction. We conduct secondary analysis and visualization based on these results. Principal Component Analysis (PCA) is a method that applies variance decomposition to reduce the dimensions of multidimensional data and extract the most significant structures within the data. t-SNE (t-Stochastic Neighbor Embedding) and UMAP (Uniform Manifold Approximation and Projection) are currently very popular nonlinear, unsupervised dimensionality reduction methods in the field of machine learning, capable of preserving the original data's characteristics to the greatest extent while significantly reducing the number of features. Space Ranger first performs PCA using the IRLBA algorithm, reducing the dataset's dimensions from (spot x genes) to (spot x M, where M is the number of principal components). It then applies the nonlinear dimensionality reduction algorithms t-SNE and UMAP to display the reduced data in a two-dimensional space. The more similar the gene expression patterns between cells, the closer their distances in the two-dimensional space of t-SNE or UMAP. For clustering analysis, Space Ranger uses the following two methods.

4.1 Graphclust Clustering Analysis

The graph-based clustering algorithm consists of two steps: First, a k-nearest neighbor sparse matrix is constructed between spots using the PCA-reduced data, grouping a spot with its k nearest spots based on Euclidean distance. Then, the Louvain algorithm is applied to optimize modularity, aiming to identify highly connected modules within the graph. Finally, hierarchical clustering is used to merge clusters within the same region that do not have differentially expressed genes (with a BH-corrected p-value below 0.05). This process is repeated until no more clusters can be merged.

posterior1_graphclust_Spatial posterior1_graphclust_tSNE posterior1_graphclust_UMAP posterior2_graphclust_Spatial posterior2_graphclust_tSNE posterior2_graphclust_UMAP

Figure 4.1 Graph-based Classification Results

graphclust_Spatial: Spatial distribution map of clusters, graphclust_tSNE: t-SNE two-dimensional display of clusters, graphclust_UMAP: UMAP two-dimensional display of clusters

Your Company

4.2 K-means Clustering Analysis

The K-Means algorithm randomly selects k cluster centroids in the PCA-reduced space. For each spot, it calculates which cluster it should belong to. Then, for each cluster, it recalculates the centroid of that cluster. This process is repeated until convergence. Note that the k here is the pre-specified number of subclusters (which is different from the k in the graph-based clustering algorithm), and the centroid represents the estimated center point of spots belonging to the same cluster. We have provided clustering results for nine different k values (k ranging from 2 to 10). The detailed results are as follows.

posterior1_kmeans_Spatial posterior2_kmeans_Spatial

Figure 4.2 Spatial Distribution of K-Means Clustering (k=2 to k=10)

posterior1_kmeans_tSNE posterior1_kmeans_UMAP posterior2_kmeans_tSNE posterior2_kmeans_UMAP

Figure 4.3 t-SNE and UMAP Distribution of K-Means Clustering (k=2 to k=10)

kmeans_tSNE: t-SNE two-dimensional display of clusters, kmeans_UMAP: UMAP two-dimensional display of clusters

Your Company

5 Seurat Analysis

Seurat is a widely used third-party tool for analyzing 10x spatial transcriptomics or single-cell transcriptomics data. Based on the Seurat software package, we have conducted secondary development to perform the following analysis steps on the gene expression matrix at the spot level: (1) Quality control of the data and removal of low-quality spots that may interfere with downstream analysis; (2) Normalization of the data to eliminate capture size bias; (3) Feature selection to identify genes with large expression differences for downstream analysis; (4) Dimensionality reduction using principal component analysis; (5) Clustering analysis of spots using a graph-based algorithm; (6) Dimensionality reduction display of clustering results using t-SNE and UMAP algorithms; (7) Identification of spatially differentially distributed genes.

5.1 Data Filtering

After obtaining the gene expression matrix generated by SpaceRanger, we will filter the spots again based on indicators such as the number of detected genes and the proportion of mitochondrial UMI, removing outliers to ensure the reliability and accuracy of the subsequent analysis results.

posterior1_MT_Spatial posterior1_Filter_Stat posterior1_Filter_Spatial posterior2_MT_Spatial posterior2_Filter_Stat posterior2_Filter_Spatial

Figure 5.1 Filtering Spots Based on Gene Detection and Mitochondrial Gene Proportion

MT_Spatial: Spatial distribution map of mitochondrial gene proportion, Filter_Stat: Distribution of spots filtered under different conditions, Filter_Spatial: Spatial distribution map of spot filtering

Distribution of spots filtered under different conditions: The x-axis of each figure represents the statistics of different filtering indicators, and the y-axis represents the number of spots. The blue line indicates the filtering threshold, and the gray area represents the spots that have been filtered out.

Your Company

5.2 Identification of Highly Variable Genes

Seurat uses the sctransform (SCT) algorithm for data normalization, and then employs the vst algorithm to select the set of genes with the greatest expression variability across the entire dataset, identified as highly variable genes (HVGs). The red points in the figure below represent the highly variable genes, totaling 3,000. The selected highly variable genes are used for principal component analysis and downstream analyses.

posterior1 posterior2

Figure 5.2 Volcano Plot of Highly Variable Genes

The x-axis represents the geometric mean of gene expression levels, and the y-axis represents the variance/variability statistics, with the top 10 highly variable genes marked.

Your Company

5.3 PCA Analysis

During PCA analysis, Seurat can calculate the standard deviation and P-value of principal components. We can select an appropriate number of principal components based on the standard deviation and P-value of the principal components.

posterior1_PCA_sdev posterior1_PCA_JackStraw posterior2_PCA_sdev posterior2_PCA_JackStraw

Figure 5.3 PCA Analysis Display

PCA_sdev: Display of standard deviation statistics for each principal component, with the x-axis representing the principal components and the y-axis representing the standard deviation.

PCA_JackStraw: JackStraw assessment of principal components for reference in selecting principal components, typically choosing principal components with a p-value < 0.05 for subsequent analysis.

Your Company

5.4 Clustering and Dimensionality Reduction Analysis

Seurat employs a Graph-based method for clustering analysis, similar to the algorithm used in Space Ranger. The clustering results are displayed through tissue sections and t-SNE and UMAP dimensionality reduction.

posterior1_cluster_Spatial posterior1_cluster_tSNE posterior1_cluster_UMAP posterior2_cluster_Spatial posterior2_cluster_tSNE posterior2_cluster_UMAP

Figure 5.4 Clustering Analysis Display: Spatial Distribution of Clusters, t-SNE Distribution, UMAP Distribution

cluster_Spatial: Spatial distribution map of clusters, cluster_tSNE: t-SNE two-dimensional display of clusters, cluster_UMAP: UMAP two-dimensional display of clusters

Your Company

5.5 Spatially Specific Gene Expression Analysis

The Seurat software provides a method for identifying spatially specific gene expression that does not rely on clustering results. This method (markvariogram) can measure the dependence of gene expression on its spatial location, and by setting certain thresholds, spatially specific gene expression can be detected. We have displayed the top 6 genes with spatially differential distribution.

posterior1 posterior2

Figure 5.5 Display of Spatially Specific Gene Expression

Your Company

6 Differential Expression Analysis

We provide clustering results from multiple clustering algorithms, allowing customers to select appropriate results based on their research objectives and biological significance for subsequent analysis and scientific research. For different clustering results, we use different differential analysis methods: For clustering results from Space Ranger, we use edgeR for differential analysis; for clustering results from Seurat, we use the Wilcoxon rank-sum test (wilcox method). Both methods utilize the default algorithms provided by their respective software packages. By comparing one subcluster of a sample with all other subclusters of the same sample, we obtain a list of differentially expressed genes between that subcluster and the others. Based on these differentially expressed genes, we identify genes with higher expression in that subcluster compared to others as candidate marker genes, i.e., those with a logFC greater than 0. Differential expression results for each cluster are presented in various formats, with the Seurat clustering results of the first sample listed in the information collection form used as an example in the final report.

6.1 Differential Analysis Results

gene_id	gene	cluster	avg_logFC	p_val	p_val_adj	pct.1	pct.2
ENSMUSG00000031425	Plp1	1	2.04208632615082	3.05033594931447e-234	5.51134699322139e-230	1	0.902
ENSMUSG00000041607	Mbp	1	1.67576480159476	8.82858058593994e-231	1.59514794026763e-226	1	0.998
ENSMUSG00000032554	Trf	1	1.68398011189939	7.47977290894177e-228	1.3514453691876e-223	1	0.819
ENSMUSG00000032517	Mobp	1	1.64413483836605	5.14147588494275e-226	9.28961862891456e-222	1	0.794

(1) gene_id: Gene ID

(2) gene: Gene name

(3) cluster: Cluster

(4) avg_logFC: Average log2 fold change in gene expression

(5) p_val: Differential analysis p-value

(6) p_val_adj: Adjusted p-value

(7) pct.1: Proportion of spots expressing the gene in the cluster

(8) pct.2: Proportion of spots expressing the gene in other clusters

6.2 Differential Gene Clustering Heatmap Display

A heatmap is generated to display the top differentially expressed genes identified in each cluster. The expression levels of these genes are higher within the same cluster compared to other groups. The log10(UMI+1) values are normalized and clustered, with yellow indicating high expression and purple indicating low expression in the heatmap. The clustering heatmap is shown in the figure below:

posterior1_cluster_heatmap posterior2_cluster_heatmap

Figure 6.1 Heatmap Display of Differentially Expressed Genes

Your Company

6.3 Spatial Distribution Display of Differentially Expressed Genes

We have presented the top 4 differentially expressed genes for each cluster in various formats. The final report only displays the results for one cluster. The figure below shows the spatial distribution of differentially expressed genes.

posterior1_cluster0 posterior2_cluster0

Figure 6.2 Spatial Distribution Display of Differentially Expressed Genes

Your Company

6.4 Violin Plot Display of Differentially Expressed Genes

We have presented the top 4 differentially expressed genes for each cluster in various formats. The final report only displays the results for one cluster. The figure below shows the Violin plot display of differentially expressed gene expression.

posterior1_cluster0 posterior2_cluster0

Figure 6.3 Violin Plot of Differentially Expressed Genes

The figure displays 4 marker genes, with the x-axis representing cluster numbers and the y-axis representing the average expression levels of the genes.

Your Company

6.5 t-SNE/UMAP Dimensionality Reduction Display of Differentially Expressed Genes

We have presented the top 4 differentially expressed genes for each cluster in various formats. The final report only displays the results for one cluster. The figure below shows the t-SNE and UMAP dimensionality reduction display of differentially expressed genes.

posterior1_cluster0_gene_tSNE posterior1_cluster0_gene_UMAP posterior2_cluster0_gene_tSNE posterior2_cluster0_gene_UMAP

Figure 6.4 t-SNE and UMAP Dimensionality Reduction Display of Differentially Expressed Genes

gene_tSNE: t-SNE two-dimensional display of gene expression, gene_UMAP: UMAP two-dimensional display of gene expression

Your Company

7 Functional Enrichment Analysis

By performing enrichment analysis on the differentially highly expressed genes in each cluster from different clustering methods, we can identify which biological functions or pathways are significantly associated with the differentially highly expressed genes in each cluster. We use the clusterProfiler software to conduct GO functional enrichment analysis and KEGG pathway enrichment analysis on the differentially highly expressed gene sets of each cluster. This method is based on the principle of the hypergeometric distribution for gene set enrichment analysis (Gene Set Enrichment, GSE). The tables displayed in the final report are the enrichment analysis results of the differentially highly expressed gene set of cluster 1 obtained from the Seurat clustering results of the first sample listed in the information collection form.

7.1 GO Functional Enrichment

GO (Gene Ontology) is a comprehensive database for describing gene functions, which can be divided into three parts: molecular function, biological process, and cellular component. GO enrichment analysis considers a corrected p-value less than 0.05 as significant enrichment, and the results are shown as follows:

Category	ID	Description	GeneRatio	BgRatio	pvalue	padj
GO	GO:0043209	myelin sheath	11/36	189/22260	6.15762409458533e-15	3.60836771942701e-12
GO	GO:0019911	structural constituent of myelin sheath	5/36	10/22260	2.07472779682085e-12	6.07895244468508e-10
GO	GO:0042552	myelination	6/36	56/22260	3.53478661298525e-10	6.90461651736452e-08
GO	GO:0043218	compact myelin	3/36	5/22260	3.87584246183853e-08	5.67810920659345e-06

(1) Category: Database name

(2) ID: GO ID

(3) Description: GO description

(4) GeneRatio: Differential gene ratio (number of differentially expressed genes in the GO term / total number of differentially expressed genes)

(5) BgRatio: Background gene ratio (number of background genes in the GO term / total number of background genes)

(6) pvalue: p-value for enrichment analysis

(7) p.adjust: Adjusted p-value

The results of the GO enrichment analysis are displayed in various formats, including a heatmap of enrichment at the cluster level, bar charts of enriched GO terms, scatter plots of enriched GO terms, and network connection diagrams of enriched GO terms.

posterior1_GO_heatmap posterior2_GO_heatmap

Figure 7.1 Heatmap of GO Enrichment Analysis

Each row represents a GO function, and each column represents a cluster number. The intensity of the color indicates the p-value (adjusted) of the enrichment analysis.

Your Company

posterior1_cluster0_GO_barplot posterior1_cluster0_GO_dotplot posterior1_cluster0_GO_emapplot posterior2_cluster1_GO_barplot posterior2_cluster1_GO_dotplot posterior2_cluster1_GO_emapplot

Figure 7.2 GO Enrichment Analysis Barplot, Dotplot, and Network Connection Map (emapplot)

Barplot: The x-axis represents the number of differentially expressed genes, and the y-axis represents GO functions. The intensity of the color indicates the p-value (adjusted) of the enrichment.

Dotplot: The x-axis represents the proportion of differentially expressed genes, and the y-axis represents GO functions. The size of the dot indicates the number of differentially expressed genes, and the intensity of the color indicates the p-value (adjusted) of the enrichment.

Network Connection Map: Nodes represent GO functions, and the thickness of the lines indicates the number of shared differentially expressed genes between pairs of GO terms. The size of the node indicates the number of differentially expressed genes, and the intensity of the color indicates the p-value (adjusted) of the enrichment.

Your Company

7.2 KEGG Pathway Enrichment

KEGG (Kyoto Encyclopedia of Genes and Genomes) is a comprehensive database that integrates genomic, chemical, and systems functional information. KEGG enrichment analysis considers a corrected p-value less than 0.05 as significant enrichment, and the results are shown as follows:

Category	ID	Description	GeneRatio	BgRatio	pvalue	padj
KEGG	mmu04971	Gastric acid secretion	3/16	76/8788	0.000321030811576264	0.0266455573608299
KEGG	mmu04972	Pancreatic secretion	3/16	109/8788	0.000924139482649758	0.0383517885299649
KEGG	mmu04216	Ferroptosis	2/16	41/8788	0.00244516858191423	0.067649664099627
KEGG	mmu04978	Mineral absorption	2/16	54/8788	0.00420869180250637	0.0873303549020072

(1) Category: Database name

(2) ID: KEGG ID

(3) Description: KEGG description

(4) GeneRatio: Differential gene ratio (number of differentially expressed genes in the KEGG pathway / total number of differentially expressed genes)

(5) BgRatio: Background gene ratio (number of background genes in the KEGG pathway / total number of background genes)

(6) pvalue: p-value for enrichment analysis

(7) p.adjust: Adjusted p-value

The results of the KEGG pathway enrichment analysis are displayed in various formats, including a heatmap of enrichment at the cluster level, bar charts of enriched KEGG pathways, scatter plots of enriched KEGG pathways, and network connection diagrams of enriched KEGG pathways.

posterior1_KEGG_heatmap posterior2_KEGG_heatmap

Figure 7.3 Heatmap of KEGG Enrichment Analysis

Each row represents a KEGG pathway, and each column represents a cluster number. The intensity of the color indicates the p-value (adjusted) of the enrichment analysis.

Your Company

posterior1_cluster1_KEGG_barplot posterior1_cluster1_KEGG_dotplot posterior1_cluster1_KEGG_emapplot posterior2_cluster1_KEGG_barplot posterior2_cluster1_KEGG_dotplot posterior2_cluster1_KEGG_emapplot

Figure 7.4 KEGG Enrichment Analysis Barplot, Dotplot, and Network Connection Map (emapplot)

Barplot: The x-axis represents the number of differentially expressed genes, and the y-axis represents KEGG pathways. The intensity of the color indicates the p-value (adjusted) of the enrichment.

Dotplot: The x-axis represents the proportion of differentially expressed genes, and the y-axis represents KEGG pathways. The size of the dot indicates the number of differentially expressed genes, and the intensity of the color indicates the p-value (adjusted) of the enrichment.

Network Connection Map: Nodes represent KEGG pathways, and the thickness of the lines indicates the number of shared differentially expressed genes between pairs of KEGG pathways. The size of the node indicates the number of differentially expressed genes, and the intensity of the color indicates the p-value (adjusted) of the enrichment.

Your Company

7.3 Reactome Pathway Enrichment

The Reactome database compiles human reactions and biological pathways. Reactome enrichment analysis considers a padj value less than 0.05 as significant enrichment, and the results are shown as follows:

Category	ID	Description	GeneRatio	BgRatio	pvalue	padj
REACTOME	R-MMU-3371571	Mus musculus: HSF1-dependent transactivation	2/18	22/8669	0.000917816629867246	0.109220178954202
REACTOME	R-MMU-442755	Mus musculus: Activation of NMDA receptors and postsynaptic events	2/18	35/8669	0.00232659397936183	0.138432341772029
REACTOME	R-MMU-5578775	Mus musculus: Ion homeostasis	2/18	50/8669	0.00470261670265493	0.146233284883189
REACTOME	R-MMU-917937	Mus musculus: Iron uptake and transport	2/18	56/8669	0.00586849262595769	0.146233284883189

(1) Category: Database name

(2) ID: Reactome pathway ID

(3) Description: Description of the Reactome pathway

(4) GeneRatio: Differential gene ratio (number of differentially expressed genes in the Reactome pathway / total number of differentially expressed genes)

(5) BgRatio: Background gene ratio (number of background genes in the Reactome pathway / total number of background genes)

(6) pvalue: p-value for enrichment analysis

(7) p.adjust: Adjusted p-value

The results of the Reactome pathway enrichment analysis are displayed in various formats, including a heatmap of enrichment at the cluster level, bar charts of enriched Reactome pathways, scatter plots of enriched Reactome pathways, and network connection diagrams of enriched Reactome pathways.

posterior1_reactome_heatmap posterior2_reactome_heatmap

Figure 7.5 Heatmap of Reactome Pathway Enrichment Analysis

Each row represents a Reactome pathway, and each column represents a cluster number. The intensity of the color indicates the p-value (adjusted) of the enrichment analysis.

Your Company

posterior1_cluster1_reactome_barplot posterior1_cluster1_reactome_dotplot posterior1_cluster1_reactome_emapplot posterior2_cluster1_reactome_barplot posterior2_cluster1_reactome_dotplot posterior2_cluster1_reactome_emapplot

Figure 7.6 Reactome Enrichment Analysis Barplot, Dotplot, and Network Connection Map (emapplot)

Barplot: The x-axis represents the number of differentially expressed genes, and the y-axis represents Reactome pathways. The intensity of the color indicates the p-value (adjusted) of the enrichment.

Dotplot: The x-axis represents the proportion of differentially expressed genes, and the y-axis represents Reactome pathways. The size of the dot indicates the number of differentially expressed genes, and the intensity of the color indicates the p-value (adjusted) of the enrichment.

Network Connection Map: Nodes represent Reactome pathways, and the thickness of the lines indicates the number of shared differentially expressed genes between pairs of Reactome pathways. The size of the node indicates the number of differentially expressed genes, and the intensity of the color indicates the p-value (adjusted) of the enrichment.

Your Company

Notes

List of Analysis Software and Versions

Analysis Content	Software	Version
Quality Control	In-house	1.0.0
Alignment and Quantification	SpaceRanger	spaceranger-1.1.0
Dimensionality Reduction and Clustering	SpaceRanger	spaceranger-1.1.0
Dimensionality Reduction and Clustering	Seurat	3.1.2
Differential Analysis	edgeR	3.24.3
Enrichment Analysis	clusterProfiler	3.10.1

Methods

To facilitate the writing of your manuscript, we have prepared an English version of the methods for your reference.

Loupe Browser Instructions

To assist you in viewing the cloupe files in the result files, we have provided the Chinese user manual for the Loupe Browser visualization software, which can be used for clustering, genotyping, differential gene screening, and heatmap drawing.

Your Company

References

1. Westholm, J.O., Huss, M., et al. (2016). Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353, 78-82. (Visium)
2. Dobin A, Davis C A, Schlesinger F, et al. STAR: ultrafast universal RNA-seq aligner[J]. Bioinformatics, 2013, 29(1): 15-21. (STAR)
3. Becht, E., McInnes, L., Healy, J. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 37, 38–44 (2019). (UMAP)
4. Van der Maaten, L., Accelerating t-SNE using Tree-Based Algorithms. Journal of Machine Learning Research 15, 3221-3245 (2014). (t-SNE)
5. Butler, A., Hoffman, P., Smibert, P., Papalexi, E., & Satija, R. (2018). Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nature Biotechnology, 36(5), 411–420. (Seurat)
6. Stuart,T.et al.Comprehensive Integration of Single-Cell Data. Cell, 177, 1888–1902.e21 (2019). (Seurat)
7. Edsgärd, D., Johnsson, P., & Sandberg, R. (2018). Identification of spatial expression trends in single-cell gene expression data. Nature Methods, 15(5), 339–342. doi:10.1038/nmeth.4634. (Seurat)
8. Korsunsky I., Millard N., Fan J., Slowikowski K., Zhang F., Wei K., Baglaenko Y., Brenner M., Loh P.R., Raychaudhuri S. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods. 2019;16:1289-1296. (Harmony)
9. Love, M. L., Huber, W. & Anders, S., Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15, number 550 (2014). (DESeq2)
10. Robinson M D, McCarthy D J, Smyth G K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data[J]. Bioinformatics, 2010, 26(1): 139-140. (edgeR)
11. Yu, D., Huber, W. & Vitek, O., Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size. Bioinformatics 29, 1275–1282 (2013).
12. Robinson, M. D. & Smyth, G. K. Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9, 321–332 (2007).
13. Guangchuang Yu, Li-Gen Wang, Yanyan Han, Qing-Yu He. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology. 2012, 16(5):284-287. (clusterProfiler)
14. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes[J]. Nucleic acids research, 2000, 28(1): 27-30. (KEGG)
15. G Yu, QY He*. ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Molecular BioSystems 2016, 12(2):477-479. (Reactome)
16. Aran D, Looney AP, Liu L, Wu E, Fong V, Hsu A. et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nature Immunology. 2019;20:163–72. (SingleR)
17. https://rdrr.io/cran/clustermole/ (clustermole)
18. Zhang X, Lan Y, Xu J, Quan F, Zhao E, Deng C. et al. CellMarker: a manually curated resource of cell markers in human and mouse. Nucleic Acids Research. 2018;47:D721–D8. (CellMarker)
19. Franzén, O., Gan, L.-M. & Björkegren, J. L. M. PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database2019, baz046 (2019). (PanglaoDB)
20. Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14, 979–982 (2017).（Monocle2
21. Vento-Tormo, R. et al. Single-cell reconstruction of the early maternal-fetal interface in humans. Nature 563, 347–353 (2018). (CellPhoneDB)
22. Kumar, M. P. et al. Analysis of SingleCell RNA-Seq Identifies Cell-Cell Communication Associated with Tumor Characteristics. Cell Rep. 2018; 25, 1458–1468.e4. (CellPhoneDB)
23. Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396–1401 (2014). (inferCNV)
24. Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008). (WGCNA)
25. Hanahan,D. and Weinberg,R.A. (2011) Hallmarks of cancer: the next generation. Cell, 144, 646–674. (Hallmark)
26. Hanahan, D., and Weinberg, R.A. (2000). The hallmarks of cancer. Cell 100, 57–70. (Hallmark)
27. Hanzelmann,S., Castelo,R. and Guinney,J. (2013) GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics, 14, 7. (GSVA)
28. Liberzon A, et al. The Molecular Signatures Database Hallmark Gene Set Collection. Cell Syst. 2015;1:417–425. doi: 10.1016/j.cels.2015.12.004. (MSigDB)
29. Qian, X., Harris, K.D., Hauling, T., Nicoloutsopoulos, D., Mun ̃ oz-Manchado, A.B., Skene, N., Hjerling-Leffler, J., and Nilsson, M. (2019). Probabilistic cell typing enables fine mapping of closely related cell types in situ. Nat. Methods. Published online November 18, 2019. (pciSeq)