DNA viruses mainly maintain their genome as episomal DNA, which is important for viral replication and gene expression (1, 2). The tethered episomes affect the gene expression patterns on host chromosomes, leading to pathological consequences such as cancer development (3-6). Therefore, it is important to elucidate the tethering sites of viral episomes in host chromosomes. Recently developed chromosome conformation capture (3C) derived Next Generation Sequencing (NGS) methods allowed us to examine the association between viral episomes and human chromosomes. In this mini review, we summarized the basic features of DNA tumor viruses, viral episomes, and their positions on human chromosomes identified by 3C-derived methods to get an insight into the tethering mechanisms and impacts on host gene expression.
Certain viruses can transform infected cells into cancerous ones. In order to gain an opportunity for tumorigenesis, viral genetic materials must persist within the host cells, which they typically do by forming an episomal structure. Tumor viruses include DNA viruses like Epstein–Barr virus (EBV), Kaposi’s sarcoma-associated herpesvirus (KSHV), human papillomavirus (HPV), hepatitis B virus (HBV), and Merkel cell polyomavirus (MCPyV), as well as a few RNA viruses. This review will mainly discuss the episomal structure of DNA tumor viruses and their typical positions on host chromosomes related to gene expression regulation.
An episome is a segment of genetic material that can exist independently or integrate into the host chromosome. Viral genomes exhibit remarkable diversity in terms of nucleic acid types, sizes, and complexity. DNA tumor viruses are double-stranded or partially single-stranded; they can be either linear or circular. The viral genomes are maintained in episomal form after infection, or some viral genomes are maintained by integration into the host chromosome.
EBV, which causes Burkitt lymphoma, has a large linear double-stranded DNA genome. The genome is around 172 kbp and encodes 80 proteins and 46 noncoding RNAs. EBV maintains its latency by keeping its chromatinized episomes in sync with the replication of the host chromosomes. The circularized viral chromosome is not integrated into the host genome and retains genomic stability while allowing the expression of a few viral genes essential for replication. Epigenetic status profoundly influences the expression of genes on episomes. Epigenetic modifications of the EBV genome occur during initial infection, latency, lytic replication, and virion production (7). Prior to the first round of EBV genome replication, the incoming EBV DNA rapidly circularizes and acquires nucleosomes in the infected cells. Episome assembly occurs during the G1 phase of host cells, long before the start of EBV-induced viral DNA replication (8).
KSHV was discovered as the causative agent of AIDS-associated Kaposi sarcoma (9). The KSHV genome is a linear double-stranded DNA. Upon infection, the linear viral DNA rapidly circularizes after entering the nucleus and is maintained as an episome (10). During latency, KSHV maintains 50-100 genome copies per infected cell (11). KSHV genomes replicate once every cell cycle in latent cells and are segregated into daughter cells. Episomal modification and nucleosome positioning play a role in both activation and inactivation of latent genes (12). On silenced episomes, transcription activation of the ORF50 immediate early gene (Rta) can initiate reactivation of the KSHV lytic cycle (13). ORF50 expression is repressed by the KSHV latency-associated nuclear antigen (LANA) during latency (12). Activated ORF50 triggers the expression of early genes required for viral DNA replication, followed by the expression of late genes (14). On the other hand, for latent infection, KSHV episomes undergo methylation at CpG nucleotides in conjunction with particular histone modification marks, resulting in the rapid establishment of latency and suppression of lytic gene expression (15).
Due to the fact that HBV infection can result in liver cirrhosis, liver failure, hepatocellular carcinoma, and even death, it is considered one of the top 20 causes of human mortality (12). HBV comprises a partially double-stranded, 3.2 kbp circular DNA genome covalently linked to a multifunctional polymerase, with both RNA- and DNA-dependent polymerase functions as well as an RNase H function. HBV virions infect hepatocytes, and then the relaxed circular DNA (rcDNA) is transported to the nucleus. This form is converted into covalently closed circular DNA (cccDNA) that exists in an episomal state, some of which are not necessary for the viral replication cycle but are integrated into the host genome (16). The host RNA polymerase II then uses cccDNA as a template to make all viral RNAs. rcDNA is transported to the nucleus to convert and amplify cccDNA via an intracellular pathway (17). cccDNA does not appear to be attached to the host chromosome during mitosis; consequently, cccDNAs are randomly distributed between daughter cells, and some are lost during cell division (18).
Long-lasting infections with high-risk HPVs can develop cancer in areas where HPV infects cells, such as the cervix and oropharynx. HPV has a circular, chromatinized double-stranded DNA genome in a non-enveloped capsid. Unlike the two herpesviruses previously introduced, HPV is a virus that completes its entire production life cycle with a circular episome in infected basal epithelial cells. HPV genomes are maintained as a low copy as circular episome replicated alongside cellular DNA (19). Integration of HPV DNA is commonly reported in related cancer genomes. However, both the integrated and episomal HPV genomes appear to be implicated in invasive cervical cancer (5). The mechanisms by which the HPV genome integrates into the host chromosome are still unknown.
MCPyV causes aggressive Merkel cell carcinoma (MCC), a rare skin cancer. MCPyV has a typical circular double-stranded DNA genome. The MCPyV genome is maintained as a replication-competent episome in persistently infected cells. During persistent infection, the virus resides and replicates as an episome in infected non-malignant cells. However, it has frequently been observed that viral DNA found in MCCs is integrated into the cellular genome.
The diversity of episomal maintenance is closely related to the viral life cycle, including DNA replication, transcriptional modulation, and genome segregation. In addition, studies of viral episome structure will provide a direction for potential therapeutic strategies because they are involved in cancer development or various immune responses by causing the regulation of host physiology.
The goal of viruses is to replicate themselves, and some viruses pass their genomes on to the next generation along with the division of host cells. In order to accomplish this, viruses have developed various strategies to replicate their genomes and attach them to host chromosomes. Viral genome tethering is required for transporting the incoming viral genome into the nucleus or maintaining the genome as an episome in persistently infected cells.
For tethering of the viral episomes, proteins that bind the episome and the host chromosome are required. In addition to their functions for tethering, these viral episome maintenance proteins (EMPs) may also be involved in viral replication and transcription. Although well-known EMPs are usually encoded by viruses, cellular proteins involved in organizing the chromosome architecture of host cells also play a role in viral episome maintenance. EBNA1 of EBV, LANA1 of KSHV, and E2 of HPV share common structural features and have an integrated function for stable segregation of episomes.
Episomal maintenance of EBV and KSHV has been well studied. EBNA1 is a viral protein expressed in all EBV-related tumors. It is necessary for viral DNA replication and episome maintenance while latently infected cells grow and divide (20). EBNA1 has two major domains in the amino (N)-terminal region with chromosome-tethering domains (CTDs) that bind to the minor groove of the AT-rich scaffold-associated region of the host chromosome (Fig. 1). A DNA-binding domain (DBD) existed in the carboxy (C)-terminal region of EBNA1 is responsible for sequence-specific DNA binding. It recognizes an 18 bp palindromic sequence found in several copies at the viral origin of plasmid replication (oriP) (21-23). EBV genome tethering can be achieved not only through direct recognition of the specific DNA sequences of these two domains but also through association with chromosome-binding proteins such as chromosome-associated EBP2, BRD4, RCC1, HMGB2, and PARP1 (24-28).
LANA is the KSHV EMP. The C-terminal DBD of LANA1 binds to the terminal repeat region of the viral episome (29) (Fig. 1). In addition, LANA binds to the core histones H2A and H2B on the nucleosomal surface (30) and their interaction is essential for KSHV genome replication and persistence (31, 32). Cellular BUB1, DEK, NUMA, PARP1, and CHD4 appear to be involved in the tethering of the KSHV episome (33-37), but further studies are needed to clarify whether this is a direct role.
The HBV X (HBx) protein is essential to initiate and maintain viral replication after infection. HBx is mostly cytoplasmic, but a minor variable fraction is in the nucleus and recruited to the cccDNA episome and participates in the initiation of cccDNA-driven transcription. However, HBx does not bind DNA directly; rather, it seems to interact with the host transcriptional machinery proteins that do (38).
The HPV E2 protein binds viral episomes to mitotic host chromosomes during cell division for partitioning and maintenance. E2 is composed of three regions: the N-terminal trans-activating domain (TAD), a hinge region, and the C-terminal DBD, which binds to several E2 binding sites on the viral episomes (Fig. 1) (39). The TAD and hinge region interact with host proteins on cellular chromosomes, and viral episomes are tethered to and stably maintained on mitotic chromosomes (40, 41).
Large and small T Antigens (LT- and ST-Ag, respectively) of MCPyV are expressed immediately upon nuclear delivery of viral episomes. These drive the cell cycle into S-phase, favorable for viral episome propagation (42). In addition, LT-Ag also possesses helicase activity and recruits host replication factors to the viral episome, functions that are essential for viral DNA replication (42). However, due to the greater focus on viral genome integration in MCC, episome tethering in MCPyV remains unexplored.
The attachment site for episomal DNA cannot be detected by linear whole genome sequencing since the episome is separated from the host chromosome, unlike viral integration sites. Advances in microscopic methods and the NGS technology made it possible to identify the position of viral episomes in the nucleus over the past decade. Fluorescence in situ hybridization (FISH) was used to identify the attachment sites of viral episomes on host chromosomes (43, 44). However, microscopic images can detect only partial sites among whole episomal attachment sites, which can be a piece of the puzzle.
3C-derived methods detect the topological structure of chromosomes (45). Briefly, cells are fixed with formaldehyde and digested with a 4 bp cutter enzyme, and then fragmented DNAs are ligated with excessive ligase. The proximity of DNA fragments can be detected by PCR with a set of primers in 3C or by the NGS technology in Hi-C (Fig. 2) (46, 47). Hi-C provides ligation frequencies between whole genomic loci that can be computationally reconstructed into three-dimensional (3D) genomic organization. In addition to cellular genomic association, information about interactions between viral episomes and host chromosomes can be extracted from Hi-C data for cells infected with episomal viruses. Circular chromosome conformation capture (4C) is the method to detect the genomic association of one locus with whole genomic regions (48, 49). Because 4C only amplify specific associations between viral episomes and host chromosomes, approximately 100-fold fewer sequencing reads compared to the Hi-C method is required. Capture Hi-C (CHi-C) also enriches specific genomic positions linked to bait (50) similarly to 4C method. To detect the tethering sites of viral episome, CHi-C uses the biotinylated RNA bait library derived from the viral genome, allowing deep sequencing information for specific target loci linked to the viral genome (Fig. 2) (33, 50). Therefore, 4C and CHi-C are useful methods for the detection of tethering sites of viral episomes on the host chromosome. Moreover, these methods require less intensive computational works than the Hi-C analysis. Nevertheless, Hi-C would be beneficial if the tethering sites of viral episomes should be understood in the context of three-dimensional structure of the host genome (Fig. 2).
Viral gene expression is regulated by epigenetic changes in viral episomes through EMPs and associated proteins. The position of viral episomes on host chromosomes is being identified in several viral cases, including EBV, KSHV, and HBV, through the 3C-derived NGS methods (6, 33, 44, 51-56) and are summarized in Table 1.
The Burkitt lymphoma cell line showed the enrichment of EBV episomes on the transcriptionally repressed genomic region that coexisted with heterochromatic marker H3K9me3 (44). Moreover, transcriptional expression of EBV tethering genes was de-repressed when EBV episomes were dissociated from the linked genes. Mechanistically, the enrichment of H3K9me 3 was significantly decreased in shEBNA1, which induces the dissociation of EBV episomes from host chromosomes (44). Therefore, EBV episome represses host gene expression mediated to heterochromatin complexes in the Burkitt lymphoma cell line. The tethering sites of EBV episomes in lymphoblastoid cell lines (LCLs) GM12878 differed from that of Burkitt lymphoma. The position in LCLs GM12878 was analyzed by 4C-seq and Hi-C methods and reproducibly confirmed to be located in active promoters and active histone markers such as H3K27ac, H3K4me1, and H3K4me3 (44, 53). The Burkitt lymphoma only expresses viral protein EBNA1, and represses other viral proteins, which belongs to the latency type I. On the contrary, LCL expresses all of the EBNAs as well as LMPs of EBV, belonging to the latency type III and similar conditions for viral reactivation of Akata-Zta cell, an EBV-positive BL (2, 57). Thus, the position of EBV episomes on host chromosomes may depend on the viral latency type.
The tethering sites of EBV in gastric carcinoma have been intensively examined through 4C-seq and Hi-C methods (6). The comparison analyses between EBV-associated gastric cancer cell lines and normal gastric epithelial cell lines revealed that EBV episome attachment induces heterochromatin to euchromatin transition. Furthermore, EBV-infected MKN7 and GES1 cells reproducibly demonstrated epigenetic redistribution from heterochromatin to euchromatin by association with EBV episomes. These results suggest that EBV episomes of gastric cancer cells are associated with the active enhancer region, which can induce epigenetic reprogramming through an unknown mechanism.
The tethering sites of KSHV episomes in primary effusion lymphoma (PEL) cell lines, such as BC-1, BC-3, and BCBL-1, were identified by the CHi-C method (33). KSHV episomes are preferentially associated with near centromeric regions in all three KSHV infected PEL cell lines, which is consistent with the results that KSHV episomal maintenance protein LANA interacts and co-localizes with centromeric protein CENP-F and kinetochore protein BUB1 (33, 37). Kumar
The tethering sites of HBV episomes are also not randomly distributed but localized at the specific genomic region The HBV episomes tend to be localized at active chromatin, such as CpG islands (CGIs), transcription start sites, and enhancers in HBV-infected hepatocytes (51, 55, 56). HBV protein HBx has central roles in the viral life cycle, including viral transcription, replication, and pathogenesis (54, 55, 59, 60). Recent studies analyzed the role of HBx in the tethering of viral episomes on the host chromosome (54-56). Moreu
Viral episomal tethering sites appear to be favorable for viral replication or transcription. Interestingly, the tethering sites of EBV episomes depend on viral latency type (44). EBV episomes in Burkitt lymphoma cell lines belonging to latency type I, indicating that most viral genes were repressed, tended to be associated with repressive chromatin regions. On the contrary, EBV episomes in LCLs belonging to latency type III, indicating that most viral genes were actively expressed, tended to be associated with active chromatin regions (44). Therefore, EBV viral episomes are positioned in the favorable region for their replication and transcription. Thus, the expression of EBV genes might be effectively controlled by host chromatin environments according to latency types. As another example, HBV cccDNA could have been affected by the cellular chromatin environment mediated to cellular protein CFP1. CFP1 binds to CGIs and recruits the methyltransferase SET1 responsible for H3K4me3 deposition. Interestingly, CFP1 also binds to HBV cccDNA and is required to enrich H3K4me3 in HBV cccDNA. The enrichment of H3K4me3 was significantly decreased in both cellular chromatin and HBV cccDNA by the depletion of CFP1. Therefore, the cellular active chromatin environment linked to HBV cccDNA can influence viral replication or transcription through the host factor, such as CFP1 (56).
NGS-based genomic technologies and microscopic analyses have revealed that genomes are not randomly distributed but organized in hierarchical order in the nucleus (61). The genome structures range from gene loops to topologically associating domains and compartment A/B (61, 62). The tethering sites of viral episomes should be understood in the context of the three-dimensional organization of the host genome since the host genome is not a one-dimensional linear structure. A recent study has shown that transcriptionally inactive HBV cccDNA is associated with the inactive compartment B and transcriptionally active HBV cccDNA preferentially interacts with the active compartment A (54).
Our knowledge of the tethering sites of viral episomes is limited to the mean score for the cell population. In the case of EBV-positive Burkitt lymphoma, approximately 1,000 significant 4C peaks were identified despite the presence of 50-100 episomes in a single nucleus (44). Thus, the combination of tethering sites in a single nucleus is unknown. Single-cell Hi-C technology has been developed to determine the genomic contacts in an individual nucleus (63, 64). However, the limited number of associations between viral episomes and host chromosomes in a single nucleus might be impossible to capture through the single-cell Hi-C analysis. Instead, future 3D genomic methods combined with single-cell technology and 4C-seq or CHi-C may allow us to amplify the specific association between viral episomes and host chromosomes in an individual nucleus.
Cellular factors for the maintenance of viral episomes have not been well addressed. It has been reported that cellular genome organizer CTCF associates with viral episomes and regulates the viral latency through forming a 3D organization of the viral genome in EBV, KSHV, herpes simplex virus (HSV), cytomegalovirus (CMV), and HPV (2, 65-67). However, the role of CTCF in the tethering of viral episomes has not been reported, and the binding sites of CTCF do not appear to correlate with the tethering sites of EBV episomes (44). The cellular protein complex SMC5/6 is known to be a host restriction factor for HBV infection by repressing the transcription of viral genes, and HBx also antagonizes the role of SMC5/6 by the SMC5/6 protein degradation (68). The expression of viral genes and thereby tethering sites of HBV episomes were regulated by cellular protein complex SMC5/6, implying that the chromatin environment of the viral episome, rather than the presence of HBx itself, is important for the position of viral episomes on host chromosomes (54). Therefore, the role of host factors, including genome organizers such as CTCF and SMC protein complexes, in the tethering of viral episomes could be an interesting research topic.
We summarized the tethering sites of viral episomes according to viruses, host cells, and viral latency types. In order to comprehensively understand the mechanism of episomal attachment on host chromosomes under these various conditions, collective information acquired from different host cells and viral latency types is necessary. In addition, most research has been conducted with virus-infected cell lines, not clinical samples. Therefore, for clinical application, it is necessary to further study the changes in epigenetic features and tethering sites of viral episomes with clinical samples.
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Science and ICT, Republic of Korea (No. 2019R1F1A1061826, 2019R1A4A1024764, 2022R1A2C100442311 and 2021R1A2C1010313) and the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI) funded by the Ministry of Health & Welfare, Republic of Korea (No. HI22C1510).
The authors have no conflicting interests.
Tethering sites of viral episome analyzed by 3C derived methods
Virus | Host | Method (resolution) | Tethering sites on host chromosome | Co-localized factors | Ref. |
---|---|---|---|---|---|
EBV | Burkitt lymphoma cell line (Daudi, KemIII, RaeI, Raji) | Hi-C (chromosome level) | Gene-poor chromosomes (latent); gene-rich chromosomes (reactivation) | ND | (52) |
Lymphoblastoid cell line (GM12878) | Hi-C (chromosome level) | Gene-poor chromosomes | ND | (52) | |
Lymphoblastoid cell line (GM12878) | Hi-C (10 kb), 4C for validation | Typical or super enhancers and active markers | EBNA2/3 (EBV), IKZF1/RUNX3, HDGF, NBS1/NFIC | (53) | |
Burkitt lymphoma cell line (MutuI, Raji) | 4C (10 kb) | Heterochromatin, silent neuronal genes | EBNA1 (EBV), EBF1, RBP-jK, H3K9me3, AT-rich flanking sequence | (44) | |
Lymphoblastoid cell line (Mutu-LCL, GM12878) | 4C, Hi-C (10 kb) | Active chromatin | EBNA2 (EBV), H3K27ac, H3K4me1/3 | (44) | |
Gastric cell lines (14 EBV associated Gastric cancer cell lines, 2 normal gastric epithelial cell lines) | Hi-C (25 kb), 4C for validation | Heterochromatin to euchromatin transition | H3K9me3 to H3K4me1/H3K27ac | (6) | |
HBV | Primary human hepatocyes (0, 7 days after infection) | Hi-C, CHi-C (400 kb) | Active chromatin, CpG islands (higly expressed genes) | Cfp1 | (56) |
HepaRG hepatocytes | 4C (2/ 10/ 50/ 250 kb) | Nuclear subdomain associated with open chromatin | HBx (HBV) | (55) | |
HepG2-NTCP | 3C-HTGTS | Transcription start sites, enhancers, CpG islands | H3K4me2/3, H3K9ac, H3K27ac, H3K36me3 | (51) | |
HepG2-NTCP | 4C, Hi-C | HBV-DX: Chr9 heterochromatin hub; HBV-wt: compartment A | HBV-DX: H3K9me3; HBV-wt: active chromatin; controlled by HBx and SMC5/6 | (54) | |
KSHV | PEL cell line (BC-1) | Hi-C (chromosome level) | Gene-poor chromosomes (latent) | ND | (52) |
PEL cell lines (BC-1, BC-3, BCBL-1) | CHi-C (10 kb) | Near Centromere (1% of total) | LANA (KSHV), ADNP, CHD4 | (33) |
ND: not determined.