BMB Reports 2024; 57(3): 135-142  https://doi.org/10.5483/BMBRep.2023-0250
Detecting DNA hydroxymethylation: exploring its role in genome regulation
Sun-Min Lee*
Department of Physics, Konkuk Univeristy, Seoul 05029, Korea
Correspondence to: Tel: +82-2-6213-4339; Fax: +82-2-3436-5361; E-mail: sml67@konkuk.ac.kr
Received: December 15, 2023; Revised: January 15, 2024; Accepted: February 1, 2024; Published online: February 19, 2024.
© Korean Society for Biochemistry and Molecular Biology. All rights reserved.

cc This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
ABSTRACT
DNA methylation is one of the most extensively studied epigenetic regulatory mechanisms, known to play crucial roles in various organisms. It has been implicated in the regulation of gene expression and chromatin changes, ranging from global alterations during cell state transitions to locus-specific modifications. 5-hydroxymethylcytosine (5hmC) is produced by a major oxidation, from 5-methylcytosine (5mC), catalyzed by the ten-eleven translocation (TET) enzymes, and is gradually being recognized for its significant role in genome regulation. With the development of state-of-the-art experimental techniques, it has become possible to detect and distinguish 5mC and 5hmC at base resolution. Various techniques have evolved, encompassing chemical and enzymatic approaches, as well as third-generation sequencing techniques. These advancements have paved the way for a thorough exploration of the role of 5hmC across a diverse array of cell types, from embryonic stem cells (ESCs) to various differentiated cells. This review aims to comprehensively report on recent techniques and discuss the emerging roles of 5hmC.
Keywords: 5-hydroxymethylcytosine, 5-methylcytosine, Base-resolution analysis, Development, TETs
INTRODUCTION

Epigenetics encompasses changes in gene function or expression that occur without any alterations to the DNA sequence itself. DNA methylation stands as the first to be discovered and most extensively studied epigenetic mark. In 1948, Rollin Hotchkiss made a pioneering discovery of modified cytosine in a calf thymus preparation using paper chromatography. Hotchkiss hypothesized that this fraction represented 5-methylcytosine (5mC) based on its separation from cytosine, similar to the way thymine (methyluracil) separates from uracil (1). In mammals, the majority of genetically significant cytosine methylations take place in C-G dinucleotides (CpG). Methylation is closely associated with gene regulation, thereby influencing development, aging, and diseases, such as cancer. The DNA methyltransferase (DNMT) family of enzymes facilitates the transfer of a methyl group to DNA. DNMT1, functioning as a maintenance methyltransferase, exhibits a strong preference for hemimethylated CpG sites. In contrast, DNMT3A, and DNMT3B lack specificity for hemimethylated target sites. Instead, they play a role in the de novo establishment of DNA methylation patterns (2).

The removal of a methyl group from cytosine can take place through two distinct mechanisms: active and passive demethylation. Passive demethylation primarily occurs during DNA replication when the newly synthesized DNA strand is not specifically targeted and methylated by the DNMTs. On the other hand, active DNA demethylation occurs in a Ten-eleven translocation (TET)-dependent manner and can be coupled with Thymine-DNA Glycosylase (TDG)-mediated base excision repair (BER) (Fig. 1A) (3). TET1, TET2, and TET3 catalyze the oxidation of 5mC to produce 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), and 5-carboxycytosine (5caC) in an Fe (II)/α-ketoglutarate-dependent manner (4). TDG is capable of excising 5fC and 5caC, initiating the restoration of the modified site to its unmethylated status through the BER mechanism (5). The active demethylation process also relies on AID/APOBEC deaminases, which deaminate 5hmC to 5-hydroxymethyluracil (5hmU), creating a G:5hmU mismatch. Subsequently, the TDG and BER mechanisms come into play (6).

5hmC is now widely acknowledged as the sixth base in the mammalian genome, succeeding its precursor, 5mC, which is considered the fifth base. Increasingly, 5hmC is being recognized not merely as an intermediate of methylcytosine (5mC) on its way to demethylation, but as a distinct base in its own right. While the role of 5mC in epigenetic regulation is well-established, the function of 5hmC has not been clearly elucidated. For many years, the predominant method for quantifying DNA methylation at a single-base level was bisulfite sequencing (BS-seq), regarded as the gold standard for methylation profiling. The process involves treating DNA with sodium bisulfite, which rapidly deaminates unmodified cytosine to uracil. After polymerase chain reaction (PCR) amplification and sequencing, this uracil is interpreted as thymine, resulting in a C→T conversion. In contrast, 5mC is deaminated much more slowly, remaining unconverted and read as C (7). Unfortunately, bisulfite sequencing has proven inadequate in distinguishing between 5mC and 5hmC, leading to the neglect of 5hmC in many studies.

With recent advancements in techniques, there are several methods, that have overcome the limitations of bisulfite conversion, enabling the distinction and measurement of 5mC and 5hmC. This review aims to consolidate these cutting-edge methods and underscore the emerging understanding of the gene-regulatory capabilities of 5hmC, distinct from 5mC, and its significance in cellular regulation.

TET-MEDIATED 5-METHYLCYTOSINE OXIDATION

In plants, the DNA glycosylase family, REPRESSOR OF SILENCING 1 (ROS1)/DEMETER (DME), can identify and directly remove the 5mC base, leading to the restoration of unmodified cytosine through BER. In contrast, in mammals, where ROS1/DME-like proteins are absent, active DNA demethylation occurs through a TET-dependent mechanism. TET proteins, identified in 2009, serve as the mammalian counterparts to trypanosome proteins JBP1 and JBP2, known for oxidizing the 5-methyl group of thymine (4). The primary structure of TETs includes a carboxy-terminal catalytic domain, comprising a cysteine-rich domain (CRD) and two double-stranded β-helix (DSBH) regions, which are separated by a large low-complexity insert (Fig. 1B). TET proteins operate as iron(II)/α-ketoglutarate (Fe(II)/α-KG)-dependent dioxygenases. The DSBH domain facilitates the assembly of Fe(II), α-KG, and 5-methylcytosine (5mC) for oxidation, with the cysteine-rich domain enveloping the DSBH core to stabilize the overall structure and promote TET–DNA interaction. Full-length TET1 and TET3 feature a CXXC domain at their amino terminus, while in TET2, the putative CXXC domain is separated from the protein due to a genomic inversion during evolution (8).

TET-mediated oxidation can target 5mC, 5hmC, and 5fC as potential substrates. However, TET enzymes may display varying binding affinities or catalytic activities toward the three substrates. Human TET1 and TET2 enzymes preferentially act on 5mC-DNA substrates over 5hmC- and 5fC-DNA substrates. Furthermore, the conversion of 5mC to 5hmC by these enzymes is faster than the conversions from 5hmC to 5fC and from 5fC to 5caC (9). Human TET2 shows diminished activity towards 5mCpC and 5mCpA in comparison to 5mCpG. Consequently, TET may preferentially target 5mC in a CpG context (10). TET1 occupancy exhibits a positive correlation with CpG density. This binding preference is partially attributed to the CXXC domain of TET, which favors CpG-rich regions. Deletion of the CXXC domain from Xenopus laevis Tet3 (xlTET3) abolishes its ability to occupy target gene promoters. Despite the absence of a CXXC interaction motif in TET2, IDAX, an ancestral CXXC protein separated from TET2 due to chromosomal rearrangement, is implicated in both recruiting TET2 to target genes and regulating its protein stability (11).

GENOMIC DISTRIBUTION AND POSSIBLE FUNCTIONS OF 5hmC

The landscape of 5hmC in ESCs

Base-resolution maps of 5hmC in human and mouse ESCs revealed that 99.89% of 5hmC exist in the CpG context, with the remaining portion found in CHG and CHH in human ESCs (98.7% in mouse ESCs) (12). Distinct from 5mC, 5hmC exhibits high clustering and enrichment at distal regulatory elements, including p300-binding sites and DNase I hypersensitive sites. Significantly elevated levels of 5hmC are observed across all the categories of distal-regulatory elements compared to promoter-proximal elements (Fig. 2). Between 44% and 74% of distal-regulatory elements demonstrate notable enrichment with 5hmC in both human and mouse ESCs.

Genome-wide association among 5hmC and diverse histone modifications revealed relatively strong correlations between 5hmC, Histone H3 Lysine 4 monomethylation (H3K4me1), and H3K4me2, with a moderate correlation observed between 5hmC and Histone H3 Lysine 27 acetylation (H3K27ac) in human ESCs. In contrast, there were relatively low levels of correlation between H3K9ac and H3K9me3 (13). Notably, 5hmC exhibited a positive correlation with active enhancer histone modifications, in contrast to its absence of correlation with repressive heterochromatin marker H3K9me3. This pattern differs from the associations observed with 5mC.

While 5mC presence in promoter regions was linked to low transcription levels, 5hmC was correlated to elevated transcription levels. Genes specifically enriched for 5hmC displayed higher transcription levels compared to those lacking both 5mC and 5hmC. Moreover, promoters enriched for both 5hmC and 5mC demonstrated higher transcription levels than those specifically enriched for 5mC, suggesting that the presence of 5hmC partially alleviates the silencing effect of 5mC (14).

The landscape of 5hmC in neuronal cells

In neuronal cells, 5hmC is approximately 10-fold more abundant than in other tissues or ESCs. Mouse cortical excitatory neurons (mean = 25%) and frontal cortex (mean = 16%) exhibited significantly higher levels of 5hmC compared to mouse ESCs (mean = 1%) (15). Simultaneous single-cell profiling of 5mC and 5hmC in mouse brains unveiled the epigenetic heterogeneity of the cells (16). The global levels of true 5mC (ranging from 49.5% to 62.3%) and 5hmC (ranging from 9.5% to 28.7%) show greater variability across cortical neuronal and nonneuronal cell types compared to BS-seq (ranging from 71.8% to 78.5%). Remarkably, postnatal nonneuronal cells (9.5%) exhibit higher levels of 5hmC than those detected in the embryonic brains (approximately 5%).

Unlike ESCs, cortical NPCs and neurons exhibit an absence of 5hmC at p300 sites. During neuronal differentiation, there is an increase in 5hmC signal, particularly in intragenic regions, with minimal change in 5mC. The genes strongly enriched with intragenic 5hmC demonstrate higher transcript levels compared to other genes, a trend more pronounced in neurons. This gene set, enriched in brain-expressed genes, includes many critical genes for neuronal differentiation, migration, or axon guidance (17). Within these genes, the acquisition of 5hmC is frequently coupled with the depletion of H3K27me3. Disrupting the function of the H3K27 methyltransferase Ezh2 or Tet2 and Tet3 results in deficiencies in neuronal differentiation.

The landscape of 5hmC in various cell types

The role of 5hmC is being elucidated not only in the differentiation processes of neural cells but also in other cell types. Through an integrative epigenomic approach, the researchers unveiled the dynamics of 5hmC during the pancreatic differentiation of human ESCs (18). The observed positive correlation between 5hmC, enhancer activities, chromatin accessibility, and the selective binding of lineage-specific pioneer transcription factors during pancreatic differentiation highlighted the intricate regulatory mechanisms at play. 5hmC is enriched at enhancers at each differentiation stage and 5hmC marked enhancers show higher transcriptional activities. Similar regulatory effects were observed during the process of T cell differentiation (19). During T-cell development and lineage specification, 5hmC undergoes dynamic changes, showing enrichment in active thymus-specific enhancers. Additionally, 5hmC is enriched in the gene body of highly expressed genes at all developmental stages. He et al. conducted a whole-genome profiling of the 5hmC landscape at single-base resolution across 19 types of human tissues (20). 5hmC exhibits a preference for decorating gene bodies and surpasses gene body 5mC in its ability to reflect gene expression. Roughly one-third of 5hmC peaks are categorized as tissue-specific differentially-hydroxymethylated regions, strategically positioned in areas that have the potential to influence the expression of nearby tissue-specific functional genes.

The landscape of 5hmC in reprogramming cells

DNA methylation patterns undergo reprogramming through global demethylation followed by de novo methylation in primordial germ cells (PGCs) and preimplantation embryos (21, 22). This process involves genome-wide loss of 5mC through replication-coupled dilution (passive demethylation) and the conversion of 5mC to 5hmC by TET enzymes (active demethylation). The repression of DNMT3A/B and UHRF1, which targets DNMT1 to replication foci, leads to passive demethylation. Conversely, increased expression of the TET family has been reported in reprogramming cells. The active demethylation process contributes to passive demethylation since UHRF1 and DNMT1 exhibit less affinity and efficiency at 5hmC:C dyads compared to 5mC:C dyad (23, 24).

Active demethylation was initially studied in the context of the zygote. Following fertilization, there is a rapid decrease in the 5mC signal on the zygotic paternal genome compared to the maternal genome, and this reduction cannot be fully explained by replication-dependent dilution (25). Notably, Tet3 is predominantly localized in the paternal pronucleus in mouse zygotes (26). Due to a more significant reduction in methylation from sperm to the parental genome compared to the decrease from oocyte to the maternal genome, there is a more extensive demethylation of CpG sites. In 5hmC sequencing results, paternal genomes showed 8.88%, while maternal genomes showed 2.13% 5hmC levels on average (27). The processes of passive and active demethylation on the paternal and maternal genomes in the zygote were investigated by using the replication inhibitor aphidicolin and deleting Tet3, respectively (3). Both paternal and maternal genome demethylation partially depend on Tet3 activity, indicating that both genomes undergo widespread active and passive demethylation in zygotes. Over 50% of demethylated CpG sites on the paternal genome are dependent on DNA replication only. Approximately 20% of CpGs depend on both Tet3 and DNA replication, while another 20% depend solely on Tet3, as defined by improved reduced representation bisulfite sequencing (RRBS). In the maternal genome, the percentage of demethylation solely dependent on Tet3 was lower. The demethylation process initiated in the zygote is primarily driven by passive demethylation, while active demethylation by TET appears to be crucial for locus-specific regulation.

Distinct roles of the TET family in 5hmC regulation

Observing the knockout effect of TET enzymes allows us to gain insight into the role that 5hmC plays in genome regulation, as well as to distinguish the specific roles of TET 1, 2, and 3. While Tet1 and Tet2 are abundantly expressed in mouse ESCs, the loss of 5hmC is more pronounced in Tet2 −/− ESCs (90.7%) compared to Tet1 −/− ESCs (44%). Tet3 is absent in ESCs and is only induced upon differentiation. It is expressed in oocytes and zygotes, where it facilitates DNA demethylation (28). Global depletion of 5hmC is observed at promoters, gene bodies, and enhancers upon the loss of Tet2 (29). Intriguingly, the loss of either Tet1 or Tet2 results in both increase and decrease of 5mC. The regions displaying an increase in 5mC upon Tet2 loss are predominantly located with H3K4me1 and H3K27ac, or sites with P300 binding or DNase I hypersensitivity. This could lead to a reduction in enhancer activity, delaying gene induction during the early stages of differentiation. Another study further demonstrated the distinctive roles of TET1 and TET2 in the regulation of 5hmC. Selective depletion of Tet1 or Tet2 by shRNA revealed that Tet1 depletion diminishes 5hmC levels at transcription start sites (TSS), while Tet2 depletion is predominantly associated with decreased 5hmC in gene bodies (Fig. 2). In contrast, at TSS regions, Tet2 depletion leads to increased 5hmC, potentially due to the redundant activity of Tet1, which showed upregulation in Tet2-depleted cells (30).

DETECTION OF 5mC AND 5hmC

Affinity enrichment-based methods

Affinity enrichment methods involve capturing methylated regions in the genome post-DNA fragmentation using antibodies (Methylated DNA immunoprecipitation sequencing, MeDIP-seq) or CpG binding proteins (MDB-seq, MethylCap-seq), followed by next-generation sequencing (NGS) for 5mC. Notably, when specific antibodies for 5hmC are used (hMeDIP), it can also capture 5hmC (14).

The phage-derived T4 β-glucosyltransferase (T4-bGT) uses uridine diphosphate glucose (UDP-glucose) to make either the α- or β-anomer of glucosyl-5hmC (5ghmC). This 5hmC-specific labeling can be used for affinity enrichment-based detection of 5hmC. hmC-Seal employs the native T4-bGT, albeit with an unconventional substrate: a chemically modified UDP-glucose derivative that incorporates an azide functional group (UDP-6-azide-glucose). This derivative selectively labels all 5hmC bases with the azido-modified glucose. Subsequently, the azido group can be linked to a biotin-containing alkyne using copper-free click chemistry. The conventional biotin-streptavidin interaction is then utilized to capture molecules containing 5hmC bases (31). hmC-Seal has higher sensitivity than hMeDIP-seq, which can pull down regions with extremely low 5hmC content from as few as 1,000 cells.

DNA Methylation profile techniques based on affinity enrichment sample a fraction of the genome, thereby reducing sequencing efforts and costs. However, notable drawbacks include a bias towards hypermethylated regions and the inability to determine the precise location of 5mC/5hmC within the reads.

Chemical approaches

BS-seq, extensively used for mapping DNA methylation patterns, employs sodium bisulfite, which selectively deaminates unmodified C to Uracil (U) while preserving 5mC and 5hmC. In the subsequent PCR and sequencing steps, U is interpreted as thymine (T) (Fig. 3A). By comparing bisulfite-converted sequences with the reference genome, 5mC and 5hmC can be distinguished from unmodified C. However, 5hmC reacts with bisulfite, to produce cytosine-5-methylenesulfonate (CMS). As CMS base pairs with G during amplification, the initial 5hmC base becomes indistinguishable from 5mC in sequencing. 5fC and 5caC present another challenge with bisulfite treatment, as they undergo deamination similar to unmodified cytosine. In Whole-Genome Bisulfite Sequencing (WGBS), bisulfite treatment is done after adaptor tagging, resulting in bisulfite-induced fragmentation of adaptor-tagged template DNAs. In contrast, post-bisulfite adaptor tagging (PBAT) involves bisulfite treatment before adaptor tagging; therefore, avoiding bisulfite-induced fragmentation of adaptor-tagged template DNAs. The PBAT method can produce a significant number of unamplified reads even with subnanogram quantities of DNA. In addition, PBAT offers the advantage of success in experiments with limited input material, such as fewer than 100 cells (32). Oxidative bisulfite sequencing (oxBS-Seq) uses potassium perruthenate (KRuO4) oxidation of 5hmC to 5fC to eliminate the 5hmC signal from WGBS; thereby, exclusively detecting 5mC (33). While it is possible to derive base-resolution 5hmC levels by comparing and subtracting oxBS results from WGBS, the subtraction-based method lacks high accuracy.

A major drawback of bisulfite conversion-based arrays is that bisulfite-treated DNA lacks cytosine; consequently, reduces sequence complexity and results in increased redundancy. The converted unmodified cytosine, representing around 95% of all cytosine in the genome, leads to reduced complexity, lower mapping efficiency, and biased genomic coverage after bisulfite treatment. Additionally, it causes DNA damage and loss due to a harsh chemical reaction.

A bisulfite-free method, Chemical-assisted pyridine borane sequencing (CAPS), can be utilized for the specific sequencing of 5hmC (34, 35). CAPS employs potassium ruthenate oxidation to convert 5hmC to 5fC, followed by borane reduction of 5fC to dihydrouracil (DHU). Subsequent PCR transforms DHU to thymine. Although CAPS achieves a conversion efficiency of 83.1% for 5hmC-to-T, CAPS+ improves this approach by replacing potassium ruthenate oxidation with milder chemical oxidation reactions. It uses 4-acetamido-2,2,6,6-tetramethylpiperidine-1-oxoammonium tetra-fluoroborate (ACT+ BF4−) to oxidize 5hmC to 5fC and employs sodium chlorite (NaClO2) in the Pinnick oxidation to convert 5fC into 5caC. Consequently, CAPS+ achieves an improved conversion rate of 94.5% for 5hmC (36).

Enzymatic approaches

Various methods have been developed progressively to overcome the limitations of the bisulfite conversion method. Recent findings suggest that DNA deaminases from the AID/APOBEC family can discern various cytosine modification states, offering new prospects for their inclusion in sequencing pipelines. APOBEC3A (A3A) stands out as the most active AID/APOBEC deaminase. These enzymes demonstrate robust deamination activity on unmodified C and 5mC but exhibit significantly impaired activity against 5hmC. This discrimination extends to 5fC and 5caC (6). Enzymatic methyl sequencing (EM-seq) employs TET2 and T4-bGT to convert 5mC and 5hmC into products resistant to A3A deamination. TET2 oxidizes 5mC to 5hmC, then to 5fC, and finally to 5caC, accompanied by the generation of CO2 and succinate. T4-bGT catalyzes the glucosylation of both TET2-formed and genomic 5hmC to 5gmC. In a subsequent reaction, A3A deaminates unmodified cytosines, converting them into uracils (Fig. 3A) (37). Notably, EM-seq demonstrated effectiveness with as little as 100 pg of DNA.

APOBEC-Coupled Epigenetic sequencing (ACE-seq) can detect 5hmC, distinguishing it from 5mC and unmethylated C (15). The process involves denaturing the DNA first, as the A3A enzyme acts on single-stranded DNA. Then, treating it with T4-bGT converts 5hmC to 5ghmC; thereby, protecting 5hmC. Subsequent treatment with A3A converts unmethylated C and 5mC to be read as T in the sequence, while 5hmC remains as C in the read. When A3A was processed without T4-bGT, it was observed that approximately 10% of 5hmC underwent deamination. Although this conversion rate is significantly lower than that of 5mC, it is attributed to A3A’s preference for TThmC sites and may be influenced by excess amounts. When T4-bGT and A3A were processed together, approximately 99.4% of all the 5hmC bases were identified as cytosine in sequencing. The nonconversion rates of cytosine and 5mC were excellent (0.3% and 1.3%, respectively), indicating high specificity. ACE-seq has a nondestructive nature and needs 1,000-fold less DNA input than conventional methods. However, ACE-seq converts unmethylated cytosine to thymine, resulting in reduced sequence complexity.

A method called Direct enzymatic sequencing (DM-seq) has been developed to distinguish 5mC from unmethylated C and 5hmC. In the first step, an engineered methyltransferase (MTase), with an S-adenosyl-l-methionine (SAM) analog, is used to create a modified cytosine base that is resistant to A3A deamination. MTases with neomorphic carboxymethyltransferase (CxMTase) activity have been discovered and engineered. A single active site point mutation allows carboxy-SAM (CxSAM) to be efficiently accepted as a substrate, creating an A3A-resistant 5-carboxymethylcytosine (5cxmC) base at unmodified C. After treatment with T4-bGT to protect 5hmC, A3A selectively deaminates 5mC, resulting in a C-to-T conversion (Fig. 3A) (38).

Chemical + enzymatic approaches

TET-assisted bisulfite sequencing (TAB-Seq) employs TET oxidation and βGT glucosylation to directly detect 5hmC (12). The activities of TET on 5mC and 5hmC are decoupled by initially converting all 5hmC to 5ghmC with UDP-glucose and T4-bGT. The 5ghmC bases are shielded from TET-mediated oxidation, while 5mC bases undergo oxidation to 5fC or 5caC. Subsequent bisulfite treatment renders only the original 5hmC bases resistant to deamination (Fig. 3A).

TET-assisted pyridine borane sequencing (TAPS) is employed for the detection of 5mC and 5hmC (34). This method combines TET oxidation of 5mC and 5hmC to 5caC with pyridine borane reduction of 5caC to DHU. Subsequent PCR transforms DHU to thymine, facilitating a C-to-T transition for 5mC and 5hmC (Fig. 3A). TAPS allows direct and highly sensitive detection of modifications, maintaining specificity without impacting unmodified cytosines. Through TET oxidation, approximately 96% of cytosine modifications are oxidized to 5caC, while 3% are oxidized to 5fC. Following borane reduction, over 99% of the cytosine modifications were converted into DHU. TAPS demonstrated a higher mapping rate in sequencing results. The reduced information content of bisulfite sequencing-converted reads, along with DNA degradation from bisulfite treatment, resulted in significantly lower mapping rates for WGBS compared to TAPS (70% vs. 90%). The attachment of glucose to 5hmC, mediated by T4-bGT, protects against both TET oxidation and borane reduction. This process, termed TAPSβ, facilitates the exclusive sequencing of 5mC. Recent reports have also highlighted drawbacks in TAPS-based methods; specifically, poor polymerase amplification of the DHU base. DNA containing DHU may be less efficiently amplified, leading to reduced amplification of methylated DNA compared to unmethylated DNA (38).

Single-cell sequencing method

Single-cell sequencing has become important for uncovering cellular diversity, dynamics, and heterogeneity, which is essential for gaining a comprehensive understanding of biological systems. Notably, single-cell analysis for DNA modifications has been developed.

The PBAT method, relying on bisulfite conversion, was implemented to minimize DNA loss in single-cell analysis (39). Furthermore, the integration of multiomics approaches has facilitated the concurrent examination of transcriptome, chromatin accessibility, and 3D genome structure along with DNA modifications in single-cell sequencing (40, 41).

In single-cell 5hmC analysis, the ACE-seq method was employed. The process involves separate steps for DNA fragmentation, 5hmC glucosylation, and ssDNA denaturation, each requiring multiple rounds of DNA purification; consequently, posing challenges for single-cell 5hmC sequencing. Fabyanic et al. propose using bisulfite treatment (called bACE-seq), which simultaneously fragments and denatures the DNA, chemically protecting 5hmC through the formation of cytosine-5-methylenesulfonate (CMS) (16). This offers a high-efficiency single-tube workflow for A3A-preferred ssDNA substrate generation. A3A-deaminated ssDNA can then be efficiently captured for low-input bulk or single-cell 5-hydroxymethylome sequencing. Additionally, bisulfite conversion deaminates 5fC and 5caC, further enhancing 5hmC profiling accuracy.

However, challenges persist in single-cell sequencing techniques, with issues such as low mapping rates and genome coverage stemming from DNA damage during the experimental process. Therefore, further refinement and improvement of experimental methods are necessary to address these issues.

Third-generation sequencing method

Nanopore sequencing by Oxford Nanopore Technologies (ONT) and single-molecule real-time (SMRT) sequencing by Pacific Biosciences (PacBio) are techniques known for their capability to sequence native DNA and deduce base modifications by analyzing their effects on the raw sequencing signal (42, 43).

PacBio’s SMRT sequencing operates on the principle of sequencing-by-synthesis, wherein the sequence of a circular DNA template is determined by fluorescence pulses. Each pulse corresponds to the addition of a labeled nucleotide by a polymerase fixed at the well’s base. Although DNA modifications don’t change the base-called sequence, they do influence polymerase kinetics (Fig. 3B). By analyzing inter-pulse durations, DNA modifications such as N6-methyladenosine (6mA), 5mC, and 5hmC can be inferred by comparing a modified template to an in silico model or an unmodified template. The nature of kinetic perturbations can be more intricate and context-specific, with the extent of these perturbations contingent on the type of DNA modification. The SMRT technology exhibits high sensitivity in detecting 4-methylcytosine (4mC) and 6mA, yet the kinetic signal changes induced by 5mC modification are exceedingly subtle. The subtle effects of 5mC and 5hmC necessitate increased coverage to 250X unless they are enriched or modified to induce a larger kinetic effect, such as glycosylation or TET-conversion to 5-carboxylcytosine (44). However, the enhancement of the SMRT signals analysis improved 5mC detection with SMRT sequencing by utilizing a holistic approach to analyze kinetic signals from a DNA polymerase and sequence context for each base within a measurement window. This methodology, referred to as the holistic kinetic (HK) model, directly examines 5mC. With a sensitivity of 90% and a specificity of 94%, this approach facilitates genome-wide 5mC detection at single-base resolution (45).

Nanopore sequencing involves measuring the variation of ionic current as a single-stranded nucleic acid is passed through a biological nanopore. Neural networks then convert the current trace into nucleotides through basecalling. DNA modifications induce variations in the raw signal, enabling their detection. It’s acknowledged that the identification of 6mA, 5mC, and 5hmC varies depending on the type of trained algorithms employed in the software used. For example, DeepMod employs training data derived from bisulfite sequencing-confirmed fully methylated or unmethylated DNA. For 5mC detection, DeepMod achieves an average precision of up to 0.99 for both synthetically introduced and naturally occurring modifications. Regarding 6mA detection, DeepMod achieves an average precision of approximately 0.9 based on Escherichia coli data (46).

The limitation of PacBio and Nanopore methods is that the DNA cannot be amplified, potentially restricting input amounts to the microgram scale (47). However, these methods are still undergoing active development. Their key benefit lies in their ability to sequence native DNA and deduce DNA modifications by analyzing the raw sequencing signal. There is no longer a requirement for specific enzymatic or chemical treatments tailored to each DNA modification of interest, thereby expanding the range of detectable modifications and simplifying experimental procedures.

CONCLUDING REMARKS

Recent advancements have seen progress in distinguishing 5mc and 5hmC at base resolution using various chemicals and enzymes. Additionally, third-generation sequencing technologies are continually evolving with diverse potentials. Depending on the experimental goals, considerations such as cost, required input DNA amount, and target modifications can guide the selection of an appropriate method.

The significance of the unique genomic distribution of 5hmC, distinct from 5mC, in various cell types, coupled with its regulatory roles in gene expression and chromatin structure, is becoming increasingly clear. It is acknowledged not only in cellular differentiation processes but also as a significant factor in the onset of diseases. With the continued advancement of techniques and ongoing research, our understanding of the mechanism of 5hmC as a genomic regulatory factor in various biological phenomena will expand.

ACKNOWLEDGEMENTS

Sun-Min Lee is supported by the Brain Pool programme funded by the Ministry of Science and ICT through the National Research Foundation of Korea (2022H1D3A2A02063272).

CONFLICTS OF INTEREST

The author has no conflicting interests.

FIGURES
Fig. 1. DNA methylation and demethylation dynamics. (A) The methyl group is chemically connected to the 5-position of the cytosine base through a stable carbon–carbon bond mediated by DNA methyltransferase (DNMT) enzymes. Enzymes of the TET family catalyze the stepwise oxidation of 5-methylcytosine in DNA to 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), and 5-carboxylcytosine (5caC). Thymine DNA glycosylase (TDG)-mediated excision of 5fC and 5caC coupled with base excision repair (BER), result in demethylation. Members of the AID/APOBEC family can catalyze the deamination of cytosine to generate uracil. Deamination of 5mC or 5hmC could produce thymidine or 5-hydroxymethyluracil (5hmU), respectively. The T:G mismatch leads to subsequent repair by TDG/BER. (B) The Ten-Eleven Translocation (TET) family proteins exhibit a specific structure and contain functional domains. All the TET family members share a conserved carboxyl-terminal core catalytic domain, which includes a double-stranded β-helix (DSBH) domain and a cysteine-rich domain.
Fig. 2. Patterns of 5mC and 5hmC enrichment in regions associated with actively expressed genes. Distinct enrichment patterns of 5mC and 5hmC in the regulatory regions of actively expressed genes in mouse ESCs, governed by TET1 and TET2.
Fig. 3. Detection of DNA modifications. (A) Chemical and enzymatic approaches are employed in conjunction with sequencing techniques to discern various cytosine modifications. BS-seq = bisulfite sequencing, both 5mC and 5hmC; oxBS-Seq = oxidative bisulfite sequencing, 5mC; CAPS = chemical-assisted pyridine borane sequencing, 5hmC; EM-seq = Enzymatic Methyl-seq, both 5mC and 5hmC; ACE-Seq = APOBEC-coupled epigenetic sequencing, 5hmC; DM-seq = Direct Methylation sequencing, 5mC; TAB-Seq = TET-assisted bisulfite sequencing, 5hmC; bACE-Seq = bisulfite-assisted ACE-seq, 5hmC; TAPS = TET-assisted pyridine borane sequencing, both 5mC and 5hmC; TAPSβ = β-glucosyltransferase and TET-assisted pyridine borane sequencing, 5mC. The references for each method are indicated. (B) Third-generation long-read sequencing method for detecting DNA modifications. The image was adapted from (48).
REFERENCES
  1. Hotchkiss RD (1948) The quantitative separation of purines, pyrimidines, and nucleosides by paper chromatography. J Biol Chem 175, 315-332.
    Pubmed CrossRef
  2. Yokochi T and Robertson KD (2002) Preferential methylation of unmethylated DNA by Mammalian de novo DNA methyltransferase Dnmt3a. J Biol Chem 277, 11735-11745.
    Pubmed CrossRef
  3. Guo F, Li X and Liang D et al (2014) Active and passive demethylation of male and female pronuclear DNA in the mammalian zygote. Cell Stem Cell 15, 447-459.
    Pubmed CrossRef
  4. Tahiliani M, Koh KP and Shen Y et al (2009) Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science 324, 930-935.
    Pubmed KoreaMed CrossRef
  5. He YF, Li BZ and Li Z et al (2011) Tet-mediated formation of 5-carboxylcytosine and its excision by TDG in mammalian DNA. Science 333, 1303-1307.
    Pubmed KoreaMed CrossRef
  6. Nabel CS, Jia H and Ye Y et al (2012) AID/APOBEC deaminases disfavor modified cytosines implicated in DNA demethylation. Nat Chem Biol 8, 751-758.
    Pubmed KoreaMed CrossRef
  7. Frommer M, McDonald LE and Millar DS et al (1992) A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc Natl Acad Sci U S A 89, 1827-1831.
    Pubmed KoreaMed CrossRef
  8. Pastor WA, Aravind L and Rao A (2013) TETonic shift: biological roles of TET proteins in DNA demethylation and transcription. Nat Rev Mol Cell Biol 14, 341-356.
    Pubmed KoreaMed CrossRef
  9. Hu LL, Lu JY and Cheng JD et al (2015) Structural insight into substrate preference for TET- mediated oxidation. Nature 527, 118-122.
    Pubmed CrossRef
  10. Hu LL, Li Z and Cheng JD et al (2013) Crystal structure of TET2-DNA complex: insight into TET-mediated 5mC oxidation. Cell 155, 1545-1555.
    Pubmed CrossRef
  11. Dunican DS, Pennings S and Meehan RR (2013) The CXXC-TET bridge - mind the methylation gap!. Cell Res 23, 973-974.
    Pubmed KoreaMed CrossRef
  12. Yu M, Hon GC and Szulwach KE et al (2012) Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian genome. Cell 149, 1368-1380.
    Pubmed KoreaMed CrossRef
  13. Szulwach KE, Li XK and Li YJ et al (2011) Integrating 5-hydroxymethylcytosine into the epigenomic landscape of human embryonic stem cells. PLoS Genet 7, e1002154.
    Pubmed KoreaMed CrossRef
  14. Ficz G, Branco MR and Seisenberger S et al (2011) Dynamic regulation of 5-hydroxymethylcytosine in mouse ES cells and during differentiation. Nature 473, 398-402.
    Pubmed CrossRef
  15. Schutsky EK, DeNizio JE and Hu P et al (2018) Nondestructive, base-resolution sequencing of 5-hydroxymethylcytosine using a DNA deaminase. Nat Biotechnol 36, 1083-1090.
    Pubmed KoreaMed CrossRef
  16. Fabyanic EB, Hu P and Qiu Q et al (2023) Joint single-cell profiling resolves 5mC and 5hmC and reveals their distinct gene regulatory effects. Nat Biotechnol 41, 1-15.
    Pubmed CrossRef
  17. Hahn MA, Qiu R and Wu X et al (2013) Dynamics of 5-hydroxymethylcytosine and chromatin marks in mammalian neurogenesis. Cell Rep 3, 291-300.
    Pubmed KoreaMed CrossRef
  18. Li J, Wu X and Zhou Y et al (2018) Decoding the dynamic DNA methylation and hydroxymethylation landscapes in endodermal lineage intermediates during pancreatic differentiation of hESC. Nucleic Acids Res 46, 2883-2900.
    Pubmed KoreaMed CrossRef
  19. Tsagaratou A, Aijo T and Lio CW et al (2014) Dissecting the dynamic changes of 5-hydroxymethylcytosine in T-cell development and differentiation. Proc Natl Acad Sci U S A 111, E3306-E3315.
    Pubmed KoreaMed CrossRef
  20. Cui XL, Nie J and Ku J et al (2020) A human tissue map of 5-hydroxymethylcytosines exhibits tissue specificity through gene and enhancer modulation. Nat Commun 11, 6161.
    Pubmed KoreaMed CrossRef
  21. Tang WW, Dietmann S and Irie N et al (2015) A unique gene regulatory network resets the human germline epigenome for development. Cell 161, 1453-1467.
    Pubmed KoreaMed CrossRef
  22. Irie N, Lee SM and Lorenzi V et al (2023) DMRT1 regulates human germline commitment. Nat Cell Biol 25, 1439-1452.
    Pubmed KoreaMed CrossRef
  23. Frauer C, Hoffmann T and Bultmann S et al (2011) Recognition of 5-hydroxymethylcytosine by the Uhrf1 SRA domain. PLoS One 6, e21306.
    Pubmed KoreaMed CrossRef
  24. Hashimoto H, Liu Y and Upadhyay AK et al (2012) Recognition and potential mechanisms for replication and erasure of cytosine hydroxymethylation. Nucleic Acids Res 40, 4841-4849.
    Pubmed KoreaMed CrossRef
  25. Mayer W, Niveleau A, Walter J, Fundele R and Haaf T (2000) Demethylation of the zygotic paternal genome. Nature 403, 501-502.
    Pubmed CrossRef
  26. Shen L, Inoue A, He J, Liu Y, Lu F and Zhang Y (2014) Tet3 and DNA replication mediate demethylation of both the maternal and paternal genomes in mouse zygotes. Cell Stem Cell 15, 459-471.
    Pubmed KoreaMed CrossRef
  27. Yan R, Cheng X and Gu C et al (2023) Dynamics of DNA hydroxymethylation and methylation during mouse embryonic and germline development. Nat Genet 55, 130-143.
    Pubmed CrossRef
  28. Dawlaty MM, Breiling A and Le T et al (2013) Combined deficiency of Tet1 and Tet2 causes epigenetic abnormalities but is compatible with postnatal development. Dev Cell 24, 310-323.
    Pubmed KoreaMed CrossRef
  29. Hon GC, Song CX and Du T et al (2014) 5mC oxidation by Tet2 modulates enhancer activity and timing of transcriptome reprogramming during differentiation. Mol Cell 56, 286-297.
    Pubmed KoreaMed CrossRef
  30. Huang Y, Chavez L and Chang X et al (2014) Distinct roles of the methylcytosine oxidases Tet1 and Tet2 in mouse embryonic stem cells. Proc Natl Acad Sci U S A 111, 1361-1366.
    Pubmed KoreaMed CrossRef
  31. Song CX, Szulwach KE and Fu Y et al (2011) Selective chemical labeling reveals the genome-wide distribution of 5-hydroxymethylcytosine. Nat Biotechnol 29, 68-72.
    Pubmed KoreaMed CrossRef
  32. Miura F, Enomoto Y, Dairiki R and Ito T (2012) Amplification-free whole-genome bisulfite sequencing by post-bisulfite adaptor tagging. Nucleic Acids Res 40, e136.
    Pubmed KoreaMed CrossRef
  33. Booth MJ, Branco MR and Ficz G et al (2012) Quantitative sequencing of 5-methylcytosine and 5-hydroxymethylcytosine at single-base resolution. Science 336, 934-937.
    Pubmed CrossRef
  34. Liu Y, Siejka-Zielinska P and Velikova G et al (2019) Bisulfite-free direct detection of 5-methylcytosine and 5-hydroxymethylcytosine at base resolution. Nat Biotechnol 37, 424-429.
    Pubmed CrossRef
  35. Liu YB, Hu ZY and Cheng JF et al (2021) Subtraction-free and bisulfite-free specific sequencing of 5-methylcytosine and its oxidized derivatives at base resolution. Nat Commun 12, 618.
    Pubmed KoreaMed CrossRef
  36. Xu HQ, Chen JF and Cheng JF et al (2023) Modular oxidation of cytosine modifications and their application in direct and quantitative sequencing of 5-hydroxymethylcytosine. JACS 145, 7095-7100.
    Pubmed KoreaMed CrossRef
  37. Vaisvila R, Ponnaluri VKC and Sun ZY et al (2021) Enzymatic methyl sequencing detects DNA methylation at single-base resolution from picograms of DNA. Genome Res 31, 1280-1289.
    Pubmed KoreaMed CrossRef
  38. Wang T, Fowler JM and Liu L et al (2023) Direct enzymatic sequencing of 5-methylcytosine at single-base resolution. Nat Chem Biol 19, 1004-1012.
    Pubmed KoreaMed CrossRef
  39. Smallwood SA, Lee HJ and Angermueller C et al (2014) Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity. Nat Methods 11, 817-820.
    Pubmed KoreaMed CrossRef
  40. Clark SJ, Argelaguet R and Kapourani CA et al (2018) scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells. Nat Commun 9, 781.
    Pubmed KoreaMed CrossRef
  41. Lee DS, Luo C and Zhou J et al (2019) Simultaneous profiling of 3D genome structure and DNA methylation in single human cells. Nat Methods 16, 999-1006.
    Pubmed KoreaMed CrossRef
  42. Flusberg BA, Webster DR and Lee JH et al (2010) Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat Methods 7, 461-465.
    Pubmed KoreaMed CrossRef
  43. Laszlo AH, Derrington IM and Brinkerhoff H et al (2013) Detection and mapping of 5-methylcytosine and 5-hydroxymethylcytosine with nanopore MspA. Proc Natl Acad Sci U S A 110, 18904-18909.
    Pubmed KoreaMed CrossRef
  44. Clark TA, Lu X and Luong K et al (2013) Enhanced 5-methylcytosine detection in single-molecule, real-time sequencing via Tet1 oxidation. BMC Biol 11, 4.
    Pubmed KoreaMed CrossRef
  45. Tse OYO, Jiang P and Cheng SH et al (2021) Genome-wide detection of cytosine methylation by single molecule real-time sequencing. Proc Natl Acad Sci U S A 118, e2019768118.
    Pubmed KoreaMed CrossRef
  46. Liu Q, Fang L, Yu G, Wang D, Xiao CL and Wang K (2019) Detection of DNA base modifications by deep recurrent neural network on Oxford Nanopore sequencing data. Nat Commun 10, 2449.
    Pubmed KoreaMed CrossRef
  47. Kingan SB, Heaton H and Cudini J et al (2019) A high-quality de novo genome assembly from a single mosquito using PacBio sequencing. Genes (Basel) 10, 62.
    Pubmed KoreaMed CrossRef
  48. Xu L and Seki M (2020) Recent advances in the detection of base modifications using the Nanopore sequencer. J Hum Genet 65, 25-33.
    Pubmed KoreaMed CrossRef


This Article


Cited By Articles

Author ORCID Information

Funding Information

Collections

Services
Social Network Service

e-submission

Archives