{"id":50459,"date":"2026-04-14T12:36:23","date_gmt":"2026-04-14T12:36:23","guid":{"rendered":"https:\/\/foreignnewstoday.com\/?p=50459"},"modified":"2026-04-14T12:36:23","modified_gmt":"2026-04-14T12:36:23","slug":"multiomics-and-deep-learning-dissect-regulatory-syntax-in-human-development","status":"publish","type":"post","link":"https:\/\/foreignnewstoday.com\/?p=50459","title":{"rendered":"Multiomics and deep learning dissect regulatory syntax in human development"},"content":{"rendered":"<p><br \/>\n<\/p>\n<div id=\"Sec11-content\">\n<h3 class=\"c-article__sub-heading\" id=\"Sec12\">Ethics statement<\/h3>\n<p>De-identified tissue samples were collected at Stanford University School of Medicine from elective termination of pregnancy procedures with informed consent for the research use of tissues in observance of relevant legal and institutional ethical regulations. No demographic information was collected. Consent was obtained by the medical team. The relevant tissue sample processing and analyses were performed under protocol SCRO-796, approved by the Stem Cell Research Oversight Panel (SCRO) at Stanford.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec13\">Sample collection and nuclei isolation<\/h3>\n<p>Tissue samples were delivered on ice and immediately stored in liquid nitrogen prior to processing. A multi-tissue compatible nuclei isolation protocol was developed to efficiently isolate stable nuclei for further library preparation. In brief, for a given sample, 100\u2013200\u2009mg of tissue was added directly into 1\u2009ml of Nuclei Extraction Buffer (250\u2009mM Sucrose, 25\u2009mM KCl, 5\u2009mM MgCl<sub>2<\/sub>, 20\u2009mM HEPES-KOH, 65\u2009mM \u03b2-glycerol, 0.5% IGEPAL CA-630, 1\u00d7 protease inhibitor, 1\u2009mM DTT, 0.2\u2009mM Spermine, 0.5\u2009mM Spermidine, 60\u2009U\u2009ml<sup>\u22121<\/sup> RNasin Plus, 2\u20135% normal goat serum) in a chilled 2\u2009ml dounce homogenizer (Kimble 885300-0002) on ice. The sample was incubated for 10\u2009min on ice. The sample was dounced 20 times each with pestle A then with pestle B. Sample was transferred to a DNA low binding tube. Three hundred\u2009\u00b5l additional Nuclei Extraction Buffer was used to rinse any remaining nuclei from dounce homogenizer. Sample was incubated with vertical rotation for 5\u2009min at 4\u2009\u00b0C. Sample was filtered using a 70-\u00b5m Flowmi strainer. Volume was adjusted with additional Nuclei Extraction Buffer to 1.2\u2009ml total volume. Thirty-seven per cent formaldehyde was added to the sample for a 0.2% final formaldehyde concentration and incubated for 4\u2009min at room temperature with vertical rotation. Fixation was quenched with 125\u2009mM glycine for 8\u2009min at room temperature with vertical rotation. Nuclei Extraction Buffer was added to the sample for a final volume of 1.4\u2009ml. An iodixanol gradient was prepared to enrich nuclei from homogenate. In brief, 50% iodixanol solution was prepared from 60% iodixanol with the addition of 1\u2009mM DTT, 60\u2009U\u2009ml<sup>\u22121<\/sup> RNasin Plus, and 2\u20135% normal goat serum. The sample was mixed with an appropriate amount of iodixanol for a final 22% iodixanol concentration. 44% iodixanol solution was layered below the sample. Then, a 22% iodixanol solution was gently added between the sample and the 44% iodixanol solution layer. The sample was centrifuged at 3,500<i>g<\/i> for 30\u2009min at 4\u2009\u00b0C with brakes off. The nuclei layer was separated with gentle pipetting for further processing.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec14\">SHARE-seq library preparation<\/h3>\n<p>The full protocol is described in Supplementary Note\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#MOESM1\">1<\/a>, adapted from published SHARE-seq protocols<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 21\" title=\"Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell 183, 1103&#x2013;1116 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR21\" id=\"ref-link-section-d9580035e2462\">21<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 91\" title=\"Kim, S. H. et al. in Chromatin Accessibility (eds Marinov, G. K. &amp; Greenleaf, W. J.) 187&#x2013;230 (Springer, 2023).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR91\" id=\"ref-link-section-d9580035e2465\">91<\/a><\/sup>. In total, we processed 76 tissue samples derived from 23 individuals, across 12 tissue processing and SHARE-seq library preparation batches, where each batch corresponded to all samples of a given organ.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec15\">Library sequencing<\/h3>\n<p>All DNA libraries were sequenced on a NovaSeq 6000 using 300-cycle S4 v1.5 reagent kits with XP workflow. Paired-end sequencing was run with a 96-99-8-96 configuration (Read1-Index1-Index2-Read2). We quantified DNA libraries using Qubit and Tapestation, then prepared library pools at 1.5\u2009nM concentration for a final loading concentration of 300\u2009pM. Sequencing was performed at the Stanford Genome Technology Center.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec16\">VISTA embryo histology<\/h3>\n<p>We received X-Gal-stained and fixed whole mouse embryos in PBS from L. Pennachio<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 26\" title=\"Kosicki, M. et al. VISTA Enhancer browser: an updated database of tissue-specific developmental enhancers. Nucleic Acids Res. 53, D324&#x2013;D330 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR26\" id=\"ref-link-section-d9580035e2486\">26<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 27\" title=\"Visel, A., Minovitsky, S., Dubchak, I. &amp; Pennacchio, L. A. VISTA Enhancer Browser&#x2013;a database of tissue-specific human enhancers. Nucleic Acids Res. 35, D88&#x2013;92 (2007).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR27\" id=\"ref-link-section-d9580035e2489\">27<\/a><\/sup> and transferred them to 70% ethanol for storage. Paraffin embedding was performed by Histo-Tec Laboratory using a xylene-free dehydration protocol as xylene could dissolve the X-Gal stain. In brief, the embryos were sequentially dehydrated with 80%, 95%, 100%, 100% and 100% ethanol for 20\u2009min each, followed by washes with 50:50, 80:20, 90:10 and 100:0 paraffin:alcohol mix for 20\u2009min each to remove the ethanol. Subsequent embedding and H&amp;E staining was performed with standard protocols on 5-\u03bcm sections.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec17\">SHARE-seq data pre-processing<\/h3>\n<p>We developed a highly parallelized, rapid, and storage-efficient pre-processing Snakemake (v7.15.1)<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 92\" title=\"M&#xF6;lder, F. et al. Sustainable data analysis with Snakemake. F1000Research 10, 33 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR92\" id=\"ref-link-section-d9580035e2501\">92<\/a><\/sup> pipeline to convert BCL files from sequencers to ATAC fragment files and RNA sparse matrices (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#Fig7\">1<\/a>). In brief, raw BCL files were first converted to FASTQ files using a custom script that parallelizes the bcl2fastq (v2.20.0.422, Illumina) conversion by flow cell tiles, parses the read cycles, and demultiplexes the raw FASTQ files into sublibraries based on sublibrary barcodes in the Index2 reads. For each sublibrary, we further split the FASTQ file into random chunks of 20 million reads.<\/p>\n<p>Within each chunk of an ATAC\u2013seq sublibrary, we performed barcode matching against the SHARE-seq barcode whitelist, allowing for 1\u2009bp mismatch for each of the three rounds of 8\u2009bp barcodes that make up a single-cell barcode, followed by Nextera adapter trimming with fastp (v0.23.2)<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 93\" title=\"Chen, S., Zhou, Y., Chen, Y. &amp; Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884&#x2013;i890 (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR93\" id=\"ref-link-section-d9580035e2511\">93<\/a><\/sup>, genome alignment with Bowtie2 (v2.5.0)<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 94\" title=\"Langmead, B. &amp; Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357 (2012).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR94\" id=\"ref-link-section-d9580035e2515\">94<\/a><\/sup>, and conversion of the output BAM file to a more storage-efficient fragment file. We then merged the fragment files from all chunks of a sublibrary, deduplicated fragments per cell based on start and end coordinates, and demultiplexed the fragments into samples based on round 1 cell barcodes. Finally, for each sample, we merged the demultiplexed fragment files for that sample across all sublibraries to generate the final ATAC\u2013seq fragment files (*.fragments.tsv.gz, *.fragments.tsv.gz.tbi).<\/p>\n<p>Within each chunk of an RNA sublibrary, we performed barcode matching, 10\u2009bp UMI parsing from Read2, and adapter trimming for Read1 only, followed by genome alignment with STAR (v2.5.4b)<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 95\" title=\"Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15&#x2013;21 (2013).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR95\" id=\"ref-link-section-d9580035e2522\">95<\/a><\/sup>, gene annotation with featureCounts (v2.0.1)<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 96\" title=\"Liao, Y., Smyth, G. K. &amp; Shi, W. eatureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923&#x2013;930 (2013).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR96\" id=\"ref-link-section-d9580035e2526\">96<\/a><\/sup>, and conversion of the output BAM file to a more storage-efficient TSV format. We then merged the annotated TSV files from all chunks of a sublibrary, split into 12 barcode chunks based on round 3 barcodes, deduplicated UMIs per cell per annotated gene per barcode chunk using UMI-tools (v1.1.2)<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 97\" title=\"Smith, T., Heger, A. &amp; Sudbery, I. UMI-tools: modeling sequencing errors in unique molecular identifiers to improve quantification accuracy. Genome Res. 27 491&#x2013;499 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR97\" id=\"ref-link-section-d9580035e2530\">97<\/a><\/sup>, demultiplexed the deduplicated TSV files into samples based on round 1 cell barcodes, and converted the TSV files into the Matrix Market Exchange format. Finally, for each sample, we merged the demultiplexed Matrix Market Exchange files for that sample across all sublibraries to generate the final RNA sparse matrix files (*.matrix.mtx.gz, *.features.tsv.gz, *.barcodes.tsv.gz).<\/p>\n<p>On average, we can process a 10B-read NovaSeq run in under 4\u2009h using an academic high performance computing cluster. This pipeline can be easily adapted to process other split-and-pool-based single-cell multiomic data. All libraries were aligned to the hg38 reference genome. The pipeline is available at <a href=\"https:\/\/github.com\/GreenleafLab\/shareseq-pipeline\">https:\/\/github.com\/GreenleafLab\/shareseq-pipeline<\/a> (stable release v1.0.0). Raw sequencing reads have been anonymized using BAMboozle<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 98\" title=\"Ziegenhain, C. &amp; Sandberg, R. BAMboozle removes genetic variation from human sequence data for open data sharing. Nat. Commun. 12, 6216 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR98\" id=\"ref-link-section-d9580035e2544\">98<\/a><\/sup> prior to public deposition to protect donor privacy. The anonymization code is available on the \u2018anonymize\u2019 branch of the shareseq-pipeline GitHub repository.<\/p>\n<p>It is well known that in ATAC\u2013seq experiments, Tn5 transposase forms a homodimer with a 9-bp gap between the two Tn5 molecules, resulting in two insertions 9\u2009bp apart on different DNA strands per accessible site<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\" title=\"Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. &amp; Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213&#x2013;1218 (2013).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR4\" id=\"ref-link-section-d9580035e2552\">4<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 99\" title=\"Adey, A. et al. Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol. 11, R119 (2010).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR99\" id=\"ref-link-section-d9580035e2555\">99<\/a><\/sup>. When sequencing the DNA fragments using paired-end sequencing, the start and end positions need to be adjusted based on the insertion offset of Tn5 to reflect the true centre of the accessible site at the midpoint of the Tn5 dimer. To account for the Tn5 offset, previous ATAC\u2013seq studies used a\u2009+\u20094\/\u22125 offset approach where plus-stranded insertions are adjusted by +4\u2009bp, and minus-stranded insertions by \u22125 bp. However, this in fact results in a 1\u2009bp mismatch of the adjusted insertion sites between the two fragments sharing a single transposition event (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#Fig7\">1b<\/a>). The discrepancy may stem from the end-exclusive coordinate system used by BAM and BED files, as the original +4\/\u22125 convention is only correct if the output file is interpreted in a non-standard 0-based, end-inclusive genomic coordinate system. This mismatch does not affect most downstream ATAC\u2013seq analysis that bins insertions on the scale of hundreds of base pairs, but it does affect base pair-sensitive analysis such as transcription factor footprinting and motif analysis. In this SHARE-seq pre-processing pipeline, we have adopted the +4\/\u22124 offset instead, which results in a consensus insertion site. See example motifs generated from reads corrected by either of these offset schemes in supplementary figure\u00a03a in ref. <sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 17\" title=\"Pampari, A. et al. ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants. Preprint at bioRxiv &#10;                https:\/\/doi.org\/10.1101\/2024.12.25.630221&#10;                &#10;               (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR17\" id=\"ref-link-section-d9580035e2562\">17<\/a><\/sup>.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec18\">SHARE-seq data QC and filtering<\/h3>\n<p>We performed per-sample quality control (QC) filtering by manually inspecting and thresholding the following metrics: (1) TSS enrichment ratio and number of fragments for ATAC\u2013seq fragment files; (2) number of UMIs, number of genes and percentage of mitochondrial reads for RNA sparse matrices; (3) ratio of RNA UMIs versus ATAC\u2013seq fragments to remove cells with low quality in one modality (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#Fig8\">2a<\/a>). All sample filtering thresholds are summarized in Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#MOESM3\">1<\/a>. No explicit batch effect correction was performed, as individual-specific effects are often confounded with temporal differences in cell type composition, making it challenging to separate these sources of variation.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec19\">RNA normalization, ambient RNA removal, dimensionality reduction and clustering<\/h3>\n<p>We used Seurat (v4.3.0)<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 100\" title=\"Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573&#x2013;3587 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR100\" id=\"ref-link-section-d9580035e2588\">100<\/a><\/sup> in R (v4.1.2) to process filtered RNA sparse matrices into Seurat objects per organ<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 100\" title=\"Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573&#x2013;3587 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR100\" id=\"ref-link-section-d9580035e2592\">100<\/a><\/sup>. We adopted an iterative dimensionality reduction and clustering workflow to sequentially annotate cell types and filter out additional low-quality clusters (Extended Data Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#Fig8\">2a<\/a>). For each iteration, we first performed SCTransform v2 and variable feature selection on RNA raw counts of each sample, then selected the top 3,000 consensus variable features across samples using the SelectIntegrationFeatures function from Seurat, excluding mitochondrial genes, sex chromosomes genes, and cell cycle genes to minimize batch effects. We merged the raw RNA counts from per-sample objects into a single matrix, performed SCTransform v2 using consensus features, and used the DecontX function from the celda (v1.6.1) package<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 101\" title=\"Yang, S. et al. Decontamination of ambient RNA in single-cell RNA-seq with DecontX. Genome Biol 21, 57 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR101\" id=\"ref-link-section-d9580035e2599\">101<\/a><\/sup> on SCT-corrected counts to remove ambient RNA contamination per cell. The decontaminated counts were then split by sample, scaled to 10,000 UMIs per cell and log-normalized. Similar to the process mentioned above, we selected a list of top 3,000 consensus variable features from the per-sample variable features. Principal components analysis was performed on the merged object with the consensus features, followed by cell clustering using the Louvain algorithm at a resolution of 0.3 with 50 principal components and uniform manifold approximation and projection (UMAP) embedding. We then inspected each cluster and removed any low-quality clusters with significantly lower UMIs than other clusters, high levels of co-expression for different tissue compartment markers that are biologically impossible and suggestive of doublets (for example, high expression for both epithelial and endothelial compartment markers), or no clear cell type-defining marker genes. After removing cells in the low-quality clusters, we repeated the processing steps starting from RNA raw counts for each sample. This process was repeated until no more low-quality clusters were identified, which usually required one to three iterations. Cells in the final set of clusters passing this iterative QC were considered \u2018whitelisted\u2019. For each cell type cluster, marker genes were identified in a one-versus-all Wilcoxon rank-sum test versus all other clusters from the same organ, and filtered to obtain genes with a log<sub>2<\/sub> fold change greater than 1.<\/p>\n<p>All final cluster annotations are included as Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#MOESM3\">2<\/a>. All cluster markers are summarized in Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#MOESM3\">3<\/a> and a subset is visualized in marker gene dot plots in Supplementary Note\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#MOESM1\">2<\/a>. All UMAP embeddings are included in Supplementary Note\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#MOESM1\">2<\/a>.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec20\">ATAC\u2013seq peak calling, motif enrichment and chromVAR<\/h3>\n<p>We used ArchR (v1.0.2)<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 102\" title=\"Granja, J. M. et al. ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat. Genet. 53, 403&#x2013;411 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR102\" id=\"ref-link-section-d9580035e2628\">102<\/a><\/sup> to process filtered ATAC fragment files into ArchR projects per organ. After filtering to the final whitelisted cell barcodes from the iterative RNA processing workflow and transferring the clustering and cell type annotations, we called peaks per cluster using Macs2 (v2.2.7.1)<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 103\" title=\"Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR103\" id=\"ref-link-section-d9580035e2632\">103<\/a><\/sup>, merged peaks into a single reproducible peak set per organ using ArchR\u2019s iterative overlap strategy, and created a cell-by-peak matrix of fragment counts. We identified marker peaks per cluster using a Wilcoxon rank-sum test and performed transcription factor motif enrichment within the marker peaks with a cutoff of FDR\u2009\u2264\u20090.1 and log<sub>2<\/sub>(fold change)\u2009\u2265\u20090.5. We calculated chromVAR<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 22\" title=\"Schep, A. N., Wu, B., Buenrostro, J. D. &amp; Greenleaf, W. J. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat. Methods 14, 975&#x2013;978 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR22\" id=\"ref-link-section-d9580035e2638\">22<\/a><\/sup> motif deviations across all clusters within each organ<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 22\" title=\"Schep, A. N., Wu, B., Buenrostro, J. D. &amp; Greenleaf, W. J. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat. Methods 14, 975&#x2013;978 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR22\" id=\"ref-link-section-d9580035e2642\">22<\/a><\/sup>. For both of these analyses, we used a curated cisBP motif set of 1,141 unique human transcription factor motifs described in refs. <sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 104\" title=\"Kartha, V. K. et al. Functional inference of gene regulation using single-cell multi-omics. Cell Genom. 2, 100166 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR104\" id=\"ref-link-section-d9580035e2647\">104<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 105\" title=\"Weirauch, M. T. et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431&#x2013;1443 (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR105\" id=\"ref-link-section-d9580035e2650\">105<\/a><\/sup>. We created a global ArchR project by merging all 12 per-organ ArchR projects and an HDMA global caCRE set by iteratively overlapping peak sets called from individual clusters across all organs<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 102\" title=\"Granja, J. M. et al. ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat. Genet. 53, 403&#x2013;411 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR102\" id=\"ref-link-section-d9580035e2654\">102<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 106\" title=\"Li, Z., Cao, H. &amp; Xu, W. The Role of the Tumor Microenvironment (TME) and Relevant Novel Biomarkers in Oncogenesis (Frontiers Media, 2024).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR106\" id=\"ref-link-section-d9580035e2657\">106<\/a><\/sup>.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec21\">Linkage of regulatory elements to genes with modified ABC model<\/h3>\n<p>We used the ABC approach to link caCREs to gene promoters. To ensure consistency of ABC enhancer regions as those in our HDMA global caCREs set, we adapted ABC<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 24\" title=\"Fulco, C. P. et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664&#x2013;1669 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR24\" id=\"ref-link-section-d9580035e2669\">24<\/a><\/sup> to enable custom regions as inputs to the model. To create the custom region set, we used bedtools to merge the HDMA global caCREs set with the hg38 genome TSS set. Following the recommended scATAC workflow from the official ABC documentation (<a href=\"https:\/\/abc-enhancer-gene-prediction.readthedocs.io\/en\/latest\/index.html\">https:\/\/abc-enhancer-gene-prediction.readthedocs.io\/en\/latest\/index.html<\/a>), we used the pseudobulk ATAC\u2013seq signal (without H3K27ac chromatin immunoprecipitation with sequencing (ChIP\u2013seq)) as enhancer activity and estimated 3D contact frequency between enhancers and promoters using a power law function of genomic distance. ABC was run on each L1 cell cluster. The results were filtered to obtain enhancer\u2013promoter links with an ABC score greater than 0.013, which corresponds to a 70% recall rate from the benchmark CRISPR dataset. Our modified ABC workflow is available at: <a href=\"https:\/\/github.com\/GreenleafLab\/ABC-Enhancer-Gene-Prediction-CustomRegions\">https:\/\/github.com\/GreenleafLab\/ABC-Enhancer-Gene-Prediction-CustomRegions<\/a> (commit b3d2156).<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec22\">VISTA enhancer analysis<\/h3>\n<p>We filtered the results to obtain VISTA-validated enhancers (accessed 24 January 2024)\u00a0that originated from humans or have a human sequence homologue and annotated as X-Gal positive in any organs present in HDMA, overlapped with HDMA global caCREs, and retained VISTA enhancers with a minimum of 75% (375\u2009bp) overlap. If multiple caCREs overlapped the same VISTA enhancer, we chose the caCRE with the highest ATAC\u2013seq signal, based on previous observations that VISTA enhancers often have a much smaller core element and enhancer activity does not depend on all regulatory elements within an enhancer<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 107\" title=\"Snetkova, V. et al. Ultraconserved enhancer function does not require perfect sequence conservation. Nat. Genet. 53, 521&#x2013;528 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR107\" id=\"ref-link-section-d9580035e2696\">107<\/a><\/sup>. ATAC\u2013seq signal was scaled and log-normalized per L1 cluster then <i>z<\/i>-scored across clusters per enhancer. For each organ, a one-sided Wilcoxon rank-sum test was performed to calculate the statistical significance of the HDMA ATAC\u2013seq signal enrichment in caCREs overlapping VISTA enhancers annotated as positive in that organ. For example, to test the significance of brain ATAC\u2013seq signal enrichment, we first subsetted to HDMA brain clusters only, then compared ATAC\u2013seq signal in caCREs overlapping VISTA brain-positive enhancers and caCREs overlapping VISTA brain-negative enhancers using a one-sided Wilcoxon rank-sum test. The effect size was calculated as the <i>W<\/i> statistic\/(<i>n<\/i><sub>1<\/sub>\u2009\u00d7\u2009<i>n<\/i><sub>2<\/sub>), where <i>n<\/i><sub>1<\/sub> is the number of caCREs in the brain-positive group and <i>n<\/i><sub>2<\/sub> is the number of caCREs in the brain-negative group. This effect size represents the AUROC probability that a given caCRE in the brain-positive group will have higher ATAC signal than a caCRE in the brain-negative group. Similarly, for the RNA data, we first identified the nearest gene for each caCRE overlapping a VISTA enhancer, scaled and log-normalized the raw RNA expression counts per L1 cluster, and then <i>z<\/i>-scored expression values across clusters per enhancer. An analogous Wilcoxon rank-sum test was performed for each organ to assess the statistical significance of the HDMA RNA signal enrichment in VISTA positive enhancers. To avoid numerical underflow to zero at machine precision in Wilcoxon rank-sum tests, <i>P<\/i> values were calculated on the log<sub>10<\/sub> scale and reported accordingly.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec23\">Preparation of input regions for ChromBPNet models<\/h3>\n<p>To define genomic regions for training ChromBPNet models, we performed a second round of peak calling to obtain a lenient set of accessible regions. First, pseudobulk fragment files for each of the 203 L1 cell type clusters were generated by concatenating fragments from the SHARE-seq ATAC\u2013seq modality for all cells in that cluster, from all samples. For each pseudobulk, we then derived pseudoreplicates. For each fragment, starts and ends (corresponding to Tn5 insertion sites) were randomly allocated to each of two pseudoreplicate files, and pseudoreplicate files were also concatenated into a total-pseudoreplicate file. Macs2 (v2.2.9.1) was used to call peaks on all three pseudoreplicate files with parameters: -p 0.01\u2013shift \u221275\u2013extsize 150\u2013nomodel -B\u2013SPMR\u2013keep-dup all\u2013call-summits. Only peaks called on the total-pseudoreplicate which overlapped peaks called in both pseudoreplicates were retained. Peaks overlapping the GRCh38 ENCODE blacklist (ENCODE accession ENCFF356LFX) were excluded. Peak coordinates were adjusted to 1,000\u2009bp centred at the Macs2 peak summit. Pseudoreplicates were only used for peak calling, and pseudobulk fragment files were used for downstream model training.<\/p>\n<p>We used the ChromBPNet package (<a href=\"https:\/\/github.com\/kundajelab\/chrombpnet\">https:\/\/github.com\/kundajelab\/chrombpnet<\/a>, commit a5c231) and followed the workflow described<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 17\" title=\"Pampari, A. et al. ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants. Preprint at bioRxiv &#10;                https:\/\/doi.org\/10.1101\/2024.12.25.630221&#10;                &#10;               (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR17\" id=\"ref-link-section-d9580035e2750\">17<\/a><\/sup>. We used the command chrombpnet prep nonpeaks to define background regions that match the GC content of peak regions. For each cell type, we used a fivefold cross-validation scheme, where each fold (designated 0 to 4) comprised a different set of training, validation, and test chromosomes, with each chromosome in the test set of at least one fold. We used the default human chromosome folds provided with ChromBPNet<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 108\" title=\"Pampari, A., Shcherbina, A. &amp; Kundaje, A. ChromBPNet data files. Zenodo &#10;                https:\/\/doi.org\/10.5281\/ZENODO.7445373&#10;                &#10;               (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR108\" id=\"ref-link-section-d9580035e2754\">108<\/a><\/sup>.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec24\">ChromBPNet model training<\/h3>\n<p>ChromBPNet models are supervised convolutional neural networks trained to use 2,114-bp one-hot-encoded DNA sequence in peaks and background regions to predict the accessibility profile (as a probability distribution) and total natural log counts (as a scalar value) in the central 1,000-bp window of input regions. ChromBPNet models use a pre-trained bias model, and then explain the residual accessibility not captured by Tn5 enzyme bias. We trained a bias model to learn the enzymatic bias in the SHARE-seq setting using fold 0 for the heart L1 cluster 0 pseudobulk, with bias threshold factor -b 0.4 using the chrombpnet bias pipeline which also performs model interpretation using DeepLIFT. We confirmed that the bias model learned the Tn5 motifs but not transcription factor motifs, and used this bias model to subsequently train ChromBPNet models for five folds for each of the 203 L1 cell types (1,015 models total) using the chrombpnet pipeline command with the GRCh38 reference genome from ENCODE (fasta: <a href=\"https:\/\/www.encodeproject.org\/files\/GRCh38_no_alt_analysis_set_GCA_000001405.15\/\">https:\/\/www.encodeproject.org\/files\/GRCh38_no_alt_analysis_set_GCA_000001405.15\/<\/a>, with chromosome sizes from ENCODE accession ENCFF667IGK).<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec25\">ChromBPNet model evaluation<\/h3>\n<p>Models were evaluated based on the Pearson and Spearman correlations between predicted and observed log counts in peaks and the Jensen\u2013Shannon distance between predicted and observed profiles in peaks, for peaks on held-out test-set chromosomes (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#MOESM3\">5<\/a> and Supplementary Note\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#MOESM1\">2<\/a>). Models for 4 cell types where the Spearman correlation for any fold was less than 0.5, generally corresponding to pseudobulks with low coverage, were excluded. To generate the average predicted accessibility tracks across folds for peak regions (representing counts per base), for each region, the mean predicted profile logits across folds were processed with the softmax function to convert them to probabilities, then scaled by the exponentiated mean predicted log counts across folds.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec26\">ChromBPNet model interpretation<\/h3>\n<p>We performed model interpretation to determine the extent to which each nucleotide was predictive for accessibility. We ran the chrombpnet interpret command which uses the DeepLIFT<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 30\" title=\"Shrikumar, A., Greenside, P. &amp; Kundaje, A. Not just a black box: learning important features through propagating activation differences. Preprint at &#10;                https:\/\/doi.org\/10.48550\/arXiv.1605.01713&#10;                &#10;               (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR30\" id=\"ref-link-section-d9580035e2795\">30<\/a><\/sup> algorithm to compute contribution scores for each nucleotide in the 2,114-bp input windows with respect to the predicted counts. Contribution scores were derived for each model fold for all peak regions, and the mean computed across folds. The averaged predicted accessibility profiles and contribution scores were converted to bigWig files, as well as used for all analyses and figures.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec27\">De novo motif discovery and assembly of the motif lexicon<\/h3>\n<p>Assembly of the de novo motif lexicon required three steps: (1) de novo motif discovery per cell type; (2) collapsing similar motifs across cell types; and (3) motif annotation.<\/p>\n<ol class=\"u-list-style-none\">\n<li>\n                  <span class=\"u-custom-list-number\">1)<\/span><\/p>\n<p>First, for de novo motif discovery on sequences driving chromatin accessibility, we used TF-MoDISco<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 31\" title=\"Shrikumar, A. et al. Technical note on transcription factor motif discovery from importance scores (TF-MoDISco) version 0.5.6.5. Preprint at &#10;                https:\/\/doi.org\/10.48550\/arXiv.1811.00416&#10;                &#10;               (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR31\" id=\"ref-link-section-d9580035e2817\">31<\/a><\/sup>, which, in brief, identifies seqlets, corresponding to short spans of contiguous high positive-importance or high negative-importance nucleotides, and clusters them into recurrent 30\u2009bp patterns. We used the implementation in the TF-MoDISco-lite package (<a href=\"https:\/\/github.com\/kundajelab\/tfmodisco\">https:\/\/github.com\/kundajelab\/tfmodisco<\/a>, v2.0.7) on the mean contribution scores for each cell type, sampling 1,000,000 seqlets for both positively and negatively contributing seqlets (parameter -n 1,000,000), and using the default behaviour to search for seqlets in the central 400\u2009bp of input regions. Each de novo motif is represented by a 4\u2009\u00d7\u200930 CWM, computed as the mean contribution score per position per nucleotide across its seqlets, and a 4\u2009\u00d7\u200930 position probability matrix (PPM), computed by normalizing the nucleotide frequencies per position across its seqlets. We manually inspected CWMs learned in each cell type, and the ChromBPNet models and motifs for ten cell types which predominantly learned low-complexity motifs were excluded from downstream analysis. This resulted in 189 cell types (945 models total) passing quality control and used throughout our study, which collectively learned 6,362 motifs including both positively contributing and negatively contributing motifs.<\/p>\n<\/li>\n<li>\n                  <span class=\"u-custom-list-number\">2)<\/span><\/p>\n<p>Next, we automatically consolidated the 6,362 motifs into a non-redundant set. We first derived clusters of motifs which were broadly similar. CWMs were trimmed by removing positions where the total contribution across nucleotides was less than 30% of the maximum total contribution among all positions<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 17\" title=\"Pampari, A. et al. ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor footprints and regulatory variants. Preprint at bioRxiv &#10;                https:\/\/doi.org\/10.1101\/2024.12.25.630221&#10;                &#10;               (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR17\" id=\"ref-link-section-d9580035e2839\">17<\/a><\/sup>. PPMs were trimmed to the same positions as the trimmed CWMs, and converted to position frequency matrices (PFMs) by multiplying PPMs by the total number of seqlets associated with each motif. PFMs from all cell types in each organ were first leniently clustered, separately for positive and negative motifs, using the gimmemotifs package<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 109\" title=\"Bruse, N. &amp; van Heeringen, S. J. GimmeMotifs: an analysis framework for transcription factor motif analysis. Preprint at bioRxiv &#10;                https:\/\/doi.org\/10.1101\/474403&#10;                &#10;               (2018).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR109\" id=\"ref-link-section-d9580035e2843\">109<\/a><\/sup> (v0.18.0, with command gimme cluster -t 0.8), which returns an average PFM for each motif cluster. Average PFMs from across organs were then subjected to a second round of clustering with gimmemotifs. Within each broad motif cluster, we then collapsed the constituent CWMs originating from individual cell types using the SimilarPatternCollapser functionality in TF-MoDISco, which merges together similar motifs using the same method it does for seqlets during initial motif discovery in step (1). This procedure resulted in 834 motifs.<\/p>\n<\/li>\n<li>\n                  <span class=\"u-custom-list-number\">3)<\/span><\/p>\n<p>Finally, we performed motif quality control, and annotated and categorized each motif. For annotation, we used TOMTOM<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 110\" title=\"Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. &amp; Noble, W. S. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR110\" id=\"ref-link-section-d9580035e2858\">110<\/a><\/sup> (v4.11.2) to compute similarities between the 834 de novo motif CWMs and a curated set of 5,193 known transcription factor binding site position weight matrices (PWMs) from JASPAR, CIS-BP, and other studies (obtained from <a href=\"https:\/\/resources.altius.org\/~jvierstra\/projects\/motif-clustering-v2.1beta\/\">https:\/\/resources.altius.org\/~jvierstra\/projects\/motif-clustering-v2.1beta\/<\/a>), using the command tomtom -no-ssc -oc.\u2013verbosity 1 -text -min-overlap 5 -mi 1 -dist pearson -evalue -thresh 10.0. For every motif, we manually inspected the most similar PWMs to assign a provisional label, typically at the transcription factor family level. We further collapsed highly similar motifs missed by the clustering approach, retaining the motif with the highest number of seqlets across cell types. We flagged 107 motifs which were low-complexity, noisy, or dominated by CG dinucleotides for exclusion. We categorized motifs as \u2018base\u2019 if they matched known PWMs; \u2018base with flanks\u2019 if they matched known PWMs but exhibited additional high-contribution nucleotides; and \u2018homocomposite\u2019 or \u2018heterocomposite\u2019 if the motifs clearly matched two similar or distinct known motifs, respectively. Motifs that did not resemble known PWMs were labelled as \u2018unresolved\u2019. After exclusion of low-quality motifs, this resulted in a set of 508 non-redundant motifs used for downstream analysis (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#MOESM3\">6<\/a>). Motif labels have the naming scheme \u2018<unique_id>\u2009|<family>\u2009:<subfamily>\u2009\/<alternative_subfamily>\u2009#<index>\u2009\u2019. The unique ID is a value from 0 to 508 (one ID corresponding to the low-quality motifs was filtered out). The index is used to distinguish separate motifs which each match the same known motif, typically representing subtle variations in nucleotide preferences or flanks. Composite motifs used a similar naming scheme, with labels for constituent motifs separated by underscores. Motifs were also assigned a broad label (for example, \u2018CTCF\u2019 and \u2018CTCFupstream\u2019 motifs shared a broad \u2018CTCF\u2019 label), used throughout figures and analyses to aggregate results.<\/index><\/alternative_subfamily><\/subfamily><\/family><\/unique_id><\/p>\n<\/li>\n<\/ol>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec28\">Identification of predictive motif instances<\/h3>\n<p>To identify genomic instances of de novo motifs in each cell type, we used the Fi-NeMo package (<a href=\"https:\/\/github.com\/kundajelab\/finemo_gpu\">https:\/\/github.com\/kundajelab\/finemo_gpu<\/a>, v0.23, commit b81876d), which performs motif scanning in a contribution-aware manner. In brief, Fi-NeMo fits a sparse linear regression model for each peak region to minimize the difference between the contribution scores in each region, and a reconstruction of the scores from a weighted combination of trimmed input CWMs. In the coefficient matrix, a non-zero coefficient at a certain location indicates a motif instance in that location. For each cell type, the output of Fi-NeMo is a BED file of predictive instances across all peaks, representing short regions which both match motifs by sequence and have relatively high absolute contribution scores. We first ran Fi-NeMo with all 834 clustered CWMs, with parameters \u2013alpha 0.8 \u2013trim-threshold 0.3 and defaults otherwise (parameter alpha is now known as lambda). Due to the competitive nature of motifs in the linear modelling approach, we ran Fi-NeMo a second time with a reduced set of 436 motifs after excluding composite motifs. We performed post-hoc filtering of motifs to obtain a high-confidence annotation for downstream analysis. To evaluate the quality of instance calls for a given motif, Fi-NeMo computes the correlation between the input CWM and the CWM derived from averaging contribution scores for all Fi-NeMo-identified instances for that motif. For each cell type, instances from the two Fi-NeMo runs were concatenated, and motifs where correlation between the instance-CWM and the input CWM was less than 0.9 were flagged. Next, all instances of these flagged motifs were filtered out. Finally, to reduce redundancy, if multiple instances with the same annotation overlapped by more than 3\u2009bp, only the instance with the highest \u2018hit_correlation\u2019 value was retained, representing the instance with the contribution scores having the highest similarity with the corresponding input CWM. This step resulted in final motif annotation for each cell type consisting of their predictive motif instances.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec29\">Inference of nucleosome positions from ATAC modality<\/h3>\n<p>We used NucleoATAC<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 34\" title=\"Schep, A. N. et al. Structured nucleosome fingerprints enable high-resolution mapping of chromatin architecture within regulatory regions. Genome Res. 25, 1757&#x2013;1770 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR34\" id=\"ref-link-section-d9580035e2899\">34<\/a><\/sup> to infer nucleosome position and occupancy from the SHARE-seq ATAC modality data. In brief, to determine nucleosome occupancy, NucleoATAC models the observed size distribution of ATAC fragments as a mixture of nucleosomal and nucleosome-free fragments, and the maximum likelihood fraction of nucleosomal fragments at a given position is used as a continuous occupancy signal. We adapted the original code to take fragments as input (<a href=\"https:\/\/github.com\/sjessa\/NucleoATAC\">https:\/\/github.com\/sjessa\/NucleoATAC<\/a>, v0.4.1), and ran the NucleoATAC workflow for each cluster pseudobulk fragment file in the same peak regions used to train ChromBPNet models. The outputs of the nucleoatac occ command were used downstream: the nucleosome dyad position calls were used for analysis of motif instances, and the per-nucleotide maximum likelihood fraction of nucleosomal fragments were used for visualization of nucleosome occupancy as genomic tracks.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec30\">Annotation of motif instances<\/h3>\n<p>To annotate predictive motif instances with respect to genomic features, instances were assigned as occurring in promoters if they were within 2\u2009kb upstream or downstream of TSSs, exonic if they overlapped exons, intronic if they overlapped gene bodies but not exons, and distal otherwise. Genomic features definitions were based on the Bioconductor TxDb.Hsapiens.UCSC.hg38.knownGene (v3.14.0) annotation, corresponding to the UCSC knownGene track from GENCODE V38, and assembled using the createGeneAnnotation function in the ArchR package. Motif instance distance to TSS was computed as the distance between instance centre positions and TSS defined in the same annotation. Similarly, for each cell type, motif distance to the nucleosome dyad or peak summit was computed by calculating the distance between instance centre positions and the nearest dyad inferred with NucleoATAC, or the peak summit, respectively. For analysis, we counted motif instances in 10\u2009bp bins from 0 to 250\u2009bp from the dyad or peak summit.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec31\">Motif co-occurrence analysis<\/h3>\n<p>To identify co-enriched motifs, we tested for pairwise motif co-occurrence enrichment within the ChromBPNet training peaks for each cell type. We restricted the analysis to the set of base motifs within the de novo compendium, and grouped motifs by their broad annotations. For each motif pair, in each cell type, we populated a 2\u2009\u00d7\u20092 contingency table with the number of peaks containing both, one, or neither motif. We then performed a one-sided Fisher\u2019s exact test to calculate a <i>P<\/i> value for enrichment. To correct for multiple hypothesis testing across all pairs, we adjusted the <i>P<\/i> values using the Benjamini\u2013Hochberg method. Significant co-occurrence was assigned for motif pairs with adjusted <i>P<\/i> values\u2009&lt;\u20090.05.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec32\">In silico marginalizations to assess motif synergy<\/h3>\n<p>We used the trained ChromBPNet models to assess motif synergy following previous approaches<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 19\" title=\"Kim, D. S. et al. The dynamic, combinatorial cis-regulatory lexicon of epidermal differentiation. Nat. Genet. 53, 1564&#x2013;1576 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR19\" id=\"ref-link-section-d9580035e2943\">19<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 42\" title=\"Avsec, &#x17D;. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354&#x2013;366 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR42\" id=\"ref-link-section-d9580035e2946\">42<\/a><\/sup>. In this in silico marginalization strategy, the predicted effect of a short sequence on chromatin accessibility is quantified by inserting the sequence into a library of background, non-accessible genomic regions (replacing the central nucleotides); and predicting accessibility for each background and edited region with a forward pass through a trained ChromBPNet model. By averaging the difference in predicted natural log counts between the edited and background regions over many regions, we estimate the \u2018marginal\u2019 effect of a sequence. Two or more sequences can be inserted to estimate joint effects of those sequences on accessibility. Specifically, for two motifs A and B, we define the predicted log counts for a region with one motif or both inserted as <span class=\"mathjax-tex\">\\({y}_{{\\rm{A}}}\\)<\/span>, <span class=\"mathjax-tex\">\\({y}_{{\\rm{B}}}\\)<\/span> and <span class=\"mathjax-tex\">\\({y}_{{\\rm{j}}}\\)<\/span> respectively; and the predicted log counts for an unedited background region as <span class=\"mathjax-tex\">\\({y}_{0}\\)<\/span>. The marginal effects, in log counts, of motif A and B are <span class=\"mathjax-tex\">\\({\\Delta }_{{\\rm{A}}}={y}_{{\\rm{A}}}-{y}_{0}\\)<\/span> and <span class=\"mathjax-tex\">\\({\\Delta }_{{\\rm{B}}}={y}_{{\\rm{B}}}-{y}_{0}\\)<\/span>, respectively, and their joint effect is <span class=\"mathjax-tex\">\\({\\Delta }_{{\\rm{J}}}={y}_{{\\rm{J}}}-{y}_{0}\\)<\/span>. We define the independent effects of motifs A and B as <span class=\"mathjax-tex\">\\({\\Delta }_{{\\rm{S}}}={\\Delta }_{{\\rm{A}}}+{\\Delta }_{{\\rm{B}}}\\)<\/span>, corresponding to a log-additive model for independent effects, or multiplicative model in units of counts. Synergy can then be defined as a significant increase in the joint effects <span class=\"mathjax-tex\">\\({\\Delta }_{{\\rm{J}}}\\)<\/span> of two sequences relative to their independent effects <span class=\"mathjax-tex\">\\({\\Delta }_{{\\rm{S}}}\\)<\/span>.<\/p>\n<p>To implement the synergy analysis, we used the tangermeme package<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 41\" title=\"Schreiber, J. tangermeme: A toolkit for understanding cis-regulatory logic using deep learning models. Preprint at bioRxiv &#10;                https:\/\/doi.org\/10.1101\/2025.08.08.669296&#10;                &#10;               (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR41\" id=\"ref-link-section-d9580035e3238\">41<\/a><\/sup> (<a href=\"https:\/\/github.com\/jmschrei\/tangermeme\">https:\/\/github.com\/jmschrei\/tangermeme<\/a>, v0.4.3). We first filtered the set of de novo composite motifs such that each composite was composed of a unique pair of constituent motifs, obtaining 138 composite motifs. For each constituent, we identified the base de novo motif with the highest number of motif instances, trimmed the CWM as above, and defined the consensus sequence as the nucleotide with the highest contribution score at each position in the trimmed motif. The motifs and associated sequences tested are presented in Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#MOESM3\">7<\/a>. Sequences were manually adjusted to further remove uninformative flanks or better match the composite motif, and deduplicated so that a pair of the same sequences was only tested once. One hundred background regions were randomly selected from GC-matched background regions for each cell type. For a pair of two of the same motifs, there are three unique orientations, and for a pair of two distinct motifs, there are four unique orientations. We considered each combination of orientation and distance between motifs an \u2018arrangement\u2019 of motifs. For each composite motif, the constituent motif sequences A and B were inserted at all possible orientations and distances (from 0 to 200\u2009bp) in the 100 background sequences, and accessibility predicted for background and edited sequences using all five ChromBPNet model folds (for the cell type with the most predictive instances of that composite motif) to compute <span class=\"mathjax-tex\">\\({\\Delta }_{{\\rm{J}}}\\)<\/span> (Supplementary Note\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#MOESM1\">3<\/a>). Similarly, A and B were inserted alone to compute independent and sum of independent effects <span class=\"mathjax-tex\">\\({\\Delta }_{{\\rm{A}}}\\)<\/span>, <span class=\"mathjax-tex\">\\({\\Delta }_{{\\rm{B}}}\\)<\/span>, and <span class=\"mathjax-tex\">\\({\\Delta }_{{\\rm{S}}}\\)<\/span>. Effects are computed for each sequence and model fold and the mean effect is reported (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#MOESM3\">7<\/a>). For joint effects, the effect of the motif pair at their optimal arrangement (that is, the combination of orientation and distance with the greatest mean effect <span class=\"mathjax-tex\">\\({\\Delta }_{{\\rm{J}}}\\)<\/span>) is reported. We considered each motif pair at their optimal arrangement and used a Wilcoxon signed-rank test to test whether the paired differences in joint and independent effects at that arrangement (<span class=\"mathjax-tex\">\\({\\Delta }_{{\\rm{J}}}-{\\Delta }_{{\\rm{S}}}\\)<\/span>) were significantly greater than 0. Multiple testing correction was performed using the Benjamini\u2013Hochberg method. Composite motifs with adjusted <i>P<\/i> value\u2009&lt;\u20090.001 and <span class=\"mathjax-tex\">\\({(\\Delta }_{{\\rm{J}}}-{\\Delta }_{{\\rm{S}}})\\)<\/span>\u2009&gt;\u20090.15 were annotated as synergistic. To confirm that inserted sequences were driving the predicted synergistic effects, we performed model interpretation using DeepLIFT as above on edited sequences, and verified that the sequences predictive of accessibility corresponded to the sequences we inserted. This identified 18 composite motifs where predicted effects were driven by different nucleotides than the ones inserted; and we abstained from classifying synergy for these motifs, resulting in 120 motifs with synergy classifications (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#MOESM3\">7<\/a>).<\/p>\n<p>To define synergistic motifs with syntax preferences (that is, with synergy limited to or increased at specific binding site arrangements), for each composite motif, we computed <i>z<\/i>-scores across the joint effects <span class=\"mathjax-tex\">\\({\\Delta }_{{\\rm{J}}}\\)<\/span> at all arrangements. Composite motifs that had any arrangement with an effect greater than four standard deviations from the mean (<i>z<\/i>-score\u2009&gt;\u20094) were annotated as having hard syntax. We also considered that motifs could have weaker long-range preferences, or soft syntax. Composite motifs with <span class=\"mathjax-tex\">\\({(\\Delta }_{{\\rm{J}}}-{\\Delta }_{{\\rm{S}}})\\)<\/span>\u2009&gt;\u20090.15 at any arrangement where constituent motifs were between 20 and 150\u2009bp apart were annotated as having soft syntax.<\/p>\n<p>To validate the specificity of our predictions, we repeated the in silico marginalization experiments for the 138 motif pairs in all 189 cell types with ChromBPNet models passing QC (Supplementary Note\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#MOESM1\">3<\/a>). For each cell type, sequences were inserted into its respective background regions, and mean effects were computed across 100 sequences and 5 model folds. Effects were computed as above, and multiple testing correction was performed using the Benjamini\u2013Hochberg method. To assess cell-type specificity of predicted synergistic effects, for select composite motifs, we analysed the in silico marginalization for the optimal arrangement across all cell types.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec33\">In silico motif ablations<\/h3>\n<p>Similarly, for select motifs, we implemented in silico ablations using tangermeme, using trained ChromBPNet models for one cell type (Heart_c3, fibroblasts). For each motif, we computed quartiles of the motif instances based on the Fi-NeMo hit correlation score, and randomly sampled 250 motif instances from each quartile (1,000 sequences total). To ablate motifs in silico, we \u2018neutralized\u2019 each instance by replacing the base at each position with the base that had the smallest absolute value contribution score in the hypothetical DeepLIFT scores\u2014that is, the most neutral base. The hypothetical contribution scores are counterfactual estimates of the contributions of all three alternate bases at each position, had it been observed in the sequence context, and are computed using DeepLIFT as part of the ChromBPNet model interpretation workflow.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec34\">Ranking of motifs by prevalence and importance<\/h3>\n<p>To assess motif prevalence and importance, we focused on motifs learned in each cell type in the cell-type-specific TF-MoDISco motif discovery step, and only considered positive motifs. To compute motif prevalence, for each cell type, we obtained the set of motifs learned in that cell type, and the number of motif instances in that cell type for the corresponding lexicon motifs. We ranked motifs within the cell type based on their number of instances, and normalized ranks by dividing each rank by the maximum rank in that cell type such that normalized ranks fell in the range [0, 1]. Similarly, to compute motif importance, for each cell type, we obtained the set of motifs learned in that cell type and summed the contribution scores across nucleotides and positions for the corresponding trimmed CWM. Finally, to compare prevalence and importance of different classes of motifs defined in the synergy analysis, we grouped motifs based on whether they had hard syntax, soft syntax, no predicted synergy, or were not tested (meaning the motif was not a composite motif, or filtered out of the synergy analysis as described above). Motifs with both hard and soft syntax were grouped with hard syntax motifs. We then computed the mean normalized rank across motifs in each group, within each cell type.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec35\">Motif footprinting<\/h3>\n<p>For select broad motifs (CTCF, ZEB\/SNAIL, NFY, the NFY negative motif, YY1\/2 and the YY1\/2 negative motif), we used our deep learning-derived motif instances, filtered to the most common variant of the motif, and then split instances into quartiles based on the Fi-NeMo hit correlation score. To compute motif footprint metaplots from the SHARE-seq ATAC modality, we used the BPCells footprint function, which counts insertions (that is, fragment starts and ends) that fall within each position in 500-bp windows centred at motif instances. The insertion counts at each position are then divided by the mean insertion count in the outer 10% on each side of the window, taken as a measure of the local background accessibility.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec36\">Enrichment of eQTL variants in motifs<\/h3>\n<p>We obtained tissue-specific GTEx v8 eQTL data<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 54\" title=\"The GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318&#x2013;1330 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR54\" id=\"ref-link-section-d9580035e3537\">54<\/a><\/sup> and concatenated them by organ source to match our fetal organs, aggregating the tissues as follows: adrenal gland, brain (amygdala, anterior cingulate cortex BA24, caudate basal ganglia, cerebellar hemisphere, cerebellum, cortex, frontal cortex BA9, hippocampus, hypothalamus, nucleus accumbens basal ganglia, putamen basal ganglia, spinal cord cervical c-1, substantia nigra), heart (artery aorta, artery coronary, atrial appendage, left ventricle), lung, liver, muscle (artery tibial, skeletal), skin (not sun exposed suprapubic, sun exposed lower leg), spleen, stomach\/oesophagus (stomach, oesophagus gastro-oesophageal junction, oesophagus mucosa, oesophagus muscularis) and thyroid. We obtained unique lists of variant-gene pairs per organ by selecting the variant with the higher posterior inclusion probability score in case of duplicates.<\/p>\n<p>For each organ, the log<sub>2<\/sub> allelic fold change effect sizes (aFCs) for each variant-gene pair was used to determine the direction of variant effect on gene expression (upregulating or downregulating expression). Separately for each organ, we concatenated all motif instances from each cell type, then deduplicated entries with identical motif names and genomic positions. The number of upregulating and downregulating variants that overlapped positive or negative motif instances from the matched fetal organ were counted separately to obtain observed counts. aFCs were then randomly shuffled and direction of effect reassigned before the counting was repeated. We performed 100,000 shuffles per organ. Enrichment scores were defined as observed counts divided by the mean of the 100,000 shuffled counts. Enrichment <i>P<\/i> values were calculated as the proportion of shuffles where shuffled counts were larger than observed counts. Multiple testing correction using the Benjamini\u2013Hochberg method was applied to the <i>P<\/i> values and a motif was considered significantly enriched with upregulating or downregulating variants if the FDR was &lt;0.05 and the observed count was above the 95% confidence interval of shuffled counts for that type of eQTL variant. Two-sided Fisher\u2019s exact tests were performed separately for positive and negative motifs to compare the number of motifs with or without significant enrichment with upregulating or downregulating eQTL variants (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#MOESM3\">8<\/a>).<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec37\">Enrichment of cell types with disease variants using g-chromVAR<\/h3>\n<p>We used g-chromVAR<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 60\" title=\"Ulirsch, J. C. Interrogation of human hematopoiesis at single-cell and single-variant resolution. Nat. Genet. 51, 683&#x2013;693 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR60\" id=\"ref-link-section-d9580035e3563\">60<\/a><\/sup> to compute cell-type-specific enrichment of disease variants. As all genetic variants in CAUSALdb<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 59\" title=\"Wang, J. et al. CAUSALdb: a database for disease\/trait causal variants identified using summary statistics of genome-wide association studies. Nucleic Acids Res. 48, D807&#x2013;D816 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR59\" id=\"ref-link-section-d9580035e3567\">59<\/a><\/sup> are reported in hg19 coordinates, ChromBPNet model input peak sets from fetal cell types were lifted over from hg38 to hg19 coordinates, and these were used as input to g-chromVAR. All coordinate conversions were performed using liftOver<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 111\" title=\"Hinrichs, A. S. et al. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 34, D590&#x2013;D598 (2006).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR111\" id=\"ref-link-section-d9580035e3571\">111<\/a><\/sup> as implemented in the rtracklayer (v1.54.0) R package<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 112\" title=\"Lawrence, M., Gentleman, R. &amp; Carey, V. rtracklayer: an R package for interfacing with genome browsers. Bioinformatics 25, 1841&#x2013;1842 (2009).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR112\" id=\"ref-link-section-d9580035e3575\">112<\/a><\/sup> and the hg38ToHg19.over.chain.gz or hg19ToHg38.over.chain.gz chain files obtained from the UCSC Genome Browser. SuSiE scores for each variant listed in CAUSALdb were separately collated for all 13,710 studies in the database. CAUSALdb contains variants where linkage disequilibrium information was missing in some GWASs, which affects the causal variant estimation in the process of fine-mapping. These variants were given default PIP values of \u22121 in the database and were removed from our analysis. We separately ran g-chromVAR (v0.3.2) for each organ to obtain <i>z<\/i>-scores and <i>P<\/i> values of cell type enrichments with credible variants per study. We used the PIP values generated by SuSiE<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 69\" title=\"Wang, G., Sarkar, A., Carbonetto, P. &amp; Stephens, M. A. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. B 82, 1273&#x2013;1300 (2020).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR69\" id=\"ref-link-section-d9580035e3586\">69<\/a><\/sup> and filtered out studies where the sums of the PIP values across credible variants in that study were &lt;5 to remove studies wherein we would be underpowered, retaining results from 13,194 studies. Multiple testing correction using the Benjamini\u2013Hochberg method was applied to all <i>P<\/i> values and a cell type was considered significantly enriched with variants from a particular study if the FDR was &lt;0.05. We next manually extracted the subset of the significant results corresponding to disease traits relevant to the organs in HDMA (see Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#MOESM3\">10<\/a> for the list of studies). Potentially causal variants (SuSiE PIP\u2009\u2265\u20090.8) were lifted over from hg19 to hg38 coordinates and overlapped with fetal motif coordinates. For each fetal cell type in our dataset, we then manually identified matching adult tissues and cell types with snATAC\u2013seq data in ENCODE, and obtained their snATAC\u2013seq pseudoreplicated peak sets from ENCODE<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 70\" title=\"ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57&#x2013;74 (2012).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR70\" id=\"ref-link-section-d9580035e3596\">70<\/a><\/sup> (Supplementary Table <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#MOESM3\">12<\/a>). Causal variants overlapping fetal motif instances were subsequently overlapped with the relevant adult peak sets (peaks with \u2212log<sub>10<\/sub> <i>P<\/i> value\u2009&gt;\u20092) if available. Only variants found in fewer than two matched adult peak sets were considered as \u2018fetal-only\u2019 hits.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec38\">Prediction of variant effect using ChromBPNet models<\/h3>\n<p>We predicted and interpreted effects of specific non-coding variants on chromatin accessibility using trained ChromBPNet models, as we have done previously<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\" title=\"Trevino, A. E. et al. Chromatin and gene-regulatory dynamics of the developing human cerebral cortex at single-cell resolution. Cell 184, 5053&#x2013;5069 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR7\" id=\"ref-link-section-d9580035e3617\">7<\/a><\/sup>. We used the tangermeme package for predictions and model interpretation. For each variant, we used the 1,000\u2009bp model training peaks for the relevant cell type to extract the reference genome sequence for the peak which the variant overlapped. This sequence (extended equally on either side to 2,114\u2009bp) was fed to all fivefold trained ChromBPNet models for the relevant cell type, to obtain predicted accessibility profile and aggregate log counts in the peak. For each fold, to transform predicted profile logits into accessibility profiles, the profile logits were softmaxed and scaled by the exponentiated predicted log counts; and model interpretation with respect to the counts output was performed using DeepLIFT. Next, the effect allele was substituted into the sequence at the variant position, and predictions and contribution scores were obtained as for the reference sequence. For each model, we computed the variant effect as the sum of differences in per-base predicted read counts in the 100\u2009bp window centred at variant, and computed the mean effect score across folds. We also computed the log2 fold change between predicted counts for the effect versus the non-effect allele for the peak region, where a log2 fold change &gt;0 indicates the effect allele was predicted to increase accessibility. In figures, the mean predicted profiles and contribution scores across folds are shown.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec39\">Genome browser visualizations<\/h3>\n<p>For all data visualization at specific genomic loci, we used the BPCells R package<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 113\" title=\"Parks, B. &amp; Greenleaf, W. Scalable high-performance single cell data analysis with BPCells. Preprint at bioRxiv &#10;                https:\/\/doi.org\/10.1101\/2025.03.27.645853&#10;                &#10;               (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#ref-CR113\" id=\"ref-link-section-d9580035e3629\">113<\/a><\/sup> (<a href=\"https:\/\/github.com\/bnprks\/BPCells\">https:\/\/github.com\/bnprks\/BPCells<\/a>) along with custom scripts included in our code repository. All genomic tracks are hosted online for interactive visualization with the WashU Genome Browser (<a href=\"https:\/\/epigenomegateway.wustl.edu\/browser2022\/?genome=hg38&amp;hub=https:\/\/human-dev-multiome-atlas.s3.amazonaws.com\/tracks\/HDMA_trackhub.json\">https:\/\/epigenomegateway.wustl.edu\/browser2022\/?genome=hg38&amp;hub=https:\/\/human-dev-multiome-atlas.s3.amazonaws.com\/tracks\/HDMA_trackhub.json<\/a>).<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec40\">Statistics and reproducibility<\/h3>\n<p>No statistical method was used to predetermine sample size. No data were excluded from the analyses. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment. For box plots throughout the figures, the elements represent the following: centre line, median; box limits, upper and lower quartiles; whiskers, minima and maxima that are no further than 1.5\u00d7 interquartile range.<\/p>\n<h3 class=\"c-article__sub-heading c-article__sub-heading--divider\" id=\"Sec41\">Reporting summary<\/h3>\n<p>Further information on research design is available in the\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41586-026-10326-9#MOESM2\">Nature Portfolio Reporting Summary<\/a> linked to this article.<\/p>\n<\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/www.nature.com\/articles\/s41586-026-10326-9\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Ethics statement De-identified tissue samples were collected at Stanford University School of Medicine from elective termination of pregnancy procedures with informed consent for the research use&hellip;<\/p>\n","protected":false},"author":1,"featured_media":50460,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[35],"tags":[],"class_list":["post-50459","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-science"],"_links":{"self":[{"href":"https:\/\/foreignnewstoday.com\/index.php?rest_route=\/wp\/v2\/posts\/50459","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/foreignnewstoday.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/foreignnewstoday.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/foreignnewstoday.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/foreignnewstoday.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=50459"}],"version-history":[{"count":0,"href":"https:\/\/foreignnewstoday.com\/index.php?rest_route=\/wp\/v2\/posts\/50459\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/foreignnewstoday.com\/index.php?rest_route=\/wp\/v2\/media\/50460"}],"wp:attachment":[{"href":"https:\/\/foreignnewstoday.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=50459"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/foreignnewstoday.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=50459"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/foreignnewstoday.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=50459"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}