CRISPR-Cas systems are diverse prokaryotic RNA-guided adaptive immune machineries that provide protection against invasions by ‘mobile genetic elements’ (MGEs), such as plasmids, viruses, and transposons, in ∼40% of bacteria and ∼90% of archaea (1-4). All CRISPR-Cas systems follow three basic steps for inheritable immunity against harmful MGEs (Fig. 1A). First, during the ‘adaptation’ (also referred to as spacer acquisition) step, foreign DNA or RNA fragments are captured, and integrated between repeats in host CRISPR arrays, updating an inheritable memory, called a spacer, for future encounters (3, 5-8). In the ‘expression’ step, integrated spacers are transcribed into a single, long precursor CRISPR RNA (pre-crRNA), and further processed into mature crRNAs. The crRNAs are assembled into a Cas effector, becoming a surveillance ribonucleoprotein complex, such as type I Cascade, type II Cas9, type V Cas12, and type VI Cas13 effectors. During the final ‘interference’ step, target binding with an effector-crRNA complex through R-loop formation results in target cleavage and degradation of the invading threat (Fig. 1A) (1, 9-12).
CRISPR-Cas systems are divided into two major classes that display distinctly different architectures of their effector modules related to crRNA processing and interference (Fig. 1B). Class 1 CRISPR-Cas systems are comprised of types I, III, and IV, and are further divided into 16 subtypes, while class 2 CRISPR-Cas systems include types II, V, and VI, and divided into 17 subtypes (13). Class 1 systems encode effector Cas modules as multi-subunit proteins, such as Cascade in type I, Csm complex in type III-A, and Cmr complex in type III-B systems. Additional Cas proteins often contribute to pre-crRNA processing or interference steps in many subtypes. In contrast, class 2 systems consist of a single, multi-domain, and large crRNA-binding protein, such as Cas9 in type II systems, Cas12 in type V systems, and Cas13 in type VI systems. These effector complexes usually exhibit an all-in-one activity for target interference, as well as pre-crRNA processing in some variants.
Likewise, two strategies exist for abortive infection by MGEs, and are utilized depending on the type of effector complex (14). In types I, II, IV, and V CRISPR-Cas systems, DNA-targeting effector complexes directly destroy invading DNA via crRNA-guided cleavage. Target DNA binding in these types leads to the activation of the DNase activity of the effector complex, resulting in specific degradation of the target DNA to circumscribe infection. Types III and VI CRISPR-Cas systems utilize RNA-targeting effector complexes. Here, the effector modules are activated upon RNA binding, leading to the RNase activity that cleaves the target RNA. Other accessory Cas proteins can also be involved in combating the invader via the activation of protease-mediated cascade pathways (e.g., TPR-CHAT/Csx29 in type III-E) (15-22), or collateral RNase activity (e.g., Csm6 in type III-A) (Fig. 1B) (23, 24). This can result in the indirect inhibition of infection through cellular signaling pathways, which leads to the activation of downstream defense genes.
Despite their evolutionary distance and structural differences, Type III and VI CRISPR-Cas systems exhibit the common features of acquiring, sensing, and cleaving target RNA molecules for (adaptive) immunity. Type III CRISPR-Cas systems, believed to be the oldest member of the CRISPR-Cas family (25, 26), are further classified into six different annotated subtypes: III-A to III-F (13). Type III CRISPR-Cas systems account for 25% of the total CRISPR-Cas loci in bacteria, and 34% in archaea (27). Class 1 Type III CRISPR-Cas effector complexes generally consist of multiple subunits Csm2, Csm3, Csm4 and Csm5 in types III-A and III-D, and Cmr1, Cmr3, Cmr4, Cmr5, and Cmr6 in types III-B and III-C, to name a few (11, 28-31). Of these, the signature Cas10 subunit, also referred to as Csm1 in types III-A and III-D, and Cmr2 in types III-B and III-C, is typically complexed with the main effector complex. Upon target RNA binding, the conformational change of the Cas10 subunit provides a DNase activity for proximal ssDNA cleavage in co-transcriptional R-loops, and an ATP cyclase activity for the generation of cyclic oligoadenylates as secondary messengers that activate Csm6 protein for collateral RNA degradation. In comparison, the type III-E and III-F systems lack Cas10 subunit. Of these, a recently reported type III-E effector, known as gRAMP (also called Cas7-11) has a unique architecture that comprises subunits fused together as single protein, resembling effectors in class 2 systems (15, 16, 21, 32-35). The gRAMP effector is complexed with a TPR-CHAT (also known as Csx29) subunit, which is a caspase-like peptidase that can cleave the Csx30 upon RNA binding to activate the CRISPR-associated sigma factor RpoE for cell cycle retardation or cell death.
On the other hand, the type VI CRISPR-Cas system has a single multidomain protein Cas13 as a signature effector protein. It acts in both crRNA processing and target RNA recognition and cleavage for the immune response (36-39). Six different type VI subtypes, types VI-A to VI-D, Cas13X, and Cas13Y, have so far been identified (13, 40). All Cas13 proteins in type VI subtypes contain two HEPN domains that are critical for RNA-mediated target surveillance. Upon target RNA loading, the conformational change activates the RNase activity in HEPN domains, resulting in target RNA cleavage, as well as collateral subversion of bystander RNA hydrolysis, which establish broad and nonspecific immunity (37, 41, 42).
The RNA targeting CRISPR-Cas systems described above require a mechanism to distinguish self from non-self. For discriminating between self and non-self, DNA targeting CRISPR-Cas effectors recognize a small RNA sequence motif called protospacer adjacent motif (PAM), to position a target sequence in MGEs (43). The absence of PAMs in the spacer flanking repeat sequences prevents self-recognition, thereby inhibiting autoimmunity. In contrast, the catalytic activity of type III and VI CRISPR-Cas effectors can be regulated by the recognition of a small RNA sequence next to a target RNA sequence derived from a repeat portion, referred to as protospacer flanking site (PFS) (44-46). While a PAM binding leads to the activation of DNA targeting effectors, a mismatch in PFS with a crRNA primes RNA targeting effector for non-self RNA targeting, preventing toxic, nonspecific targeting of self-transcripts. In other words, autoimmune response can occur by signifying self-transcripts through complementarity of the crRNA to the PFS in the antisense CRISPR array transcript.
How do bacteria or archaea archive and remember previous invaders? This process occurs during the ‘adaptation’ step. In DNA-targeting CRISPR-Cas systems, the highly conserved Cas1-Cas2 complex mediates spacer acquisition derived from DNA, and integrates them into the CRISPR array (Fig. 1A) (47-49). Cas1-Cas2 complex forms a heterohexameric Cas1 (4)– Cas2 (2) complex (Fig. 2A, B) (3, 5, 47, 48, 50-54). In vivo studies on spacer acquisition suggest that Cas1-Cas2 complex identifies suitable prespacers based on the PAM, which is also a prerequisite for the CRISPR-interference stage of immunity (55-57). Structural studies on the E. coli type I-E Cas1-Cas2 complex have demonstrated that the C terminal tail of the Cas1 subunit is responsible for PAM recognition (47, 52). In contrast, Cas1-Cas2 complexes in some type I systems, except type I-E and I-F systems, use an additional adaptation factor, called Cas4, for PAM recognition (58-61). Although the Cas1-Cas2 complex preferentially integrates partial duplex DNAs harboring single-stranded 3’ overhangs in vitro (47, 52, 62), a single-molecule study showed that the Cas1-Cas2 complex actually binds to a single strand of DNA containing a PAM sequence in more favorable manner, suggesting that the annealing of complementary ssDNA facilitates the generation of a suitable substrate comprising a PS duplex with 3’ overhangs (63). In most type II systems, Cas9 and Csn2, together with a Cas1-Cas2 heterohexamer, play a role in PAM recognition, although the precise mechanism remains to be investigated (64-67). During DNA adaptation, the Cas1-Cas2 complex controls the correct orientation of integrated spacers through asymmetric prespacer trimming in a delayed PAM trimming mode by DnaQ enzymes, or Cas4 endonuclease itself (59, 63, 68).
For type III and VI CRISPR-Cas systems, the possibility of spacer acquisition derived from RNA by type III and VI CRISPR systems has been raised, as in addition to DNA, these can target RNA (49). Interestingly, some bacteria harboring type III CRISPR-Cas systems have Cas1 fused with reverse transcriptase (RT), and these RT domains have been thought to be related to RTs known as group II introns (Fig. 3A) (49, 69-72). Several Cas1s and associated RTs in type III CRISPR systems even seem to have co-evolved (70, 72, 73). Specifically, 537 sequences were analyzed in a comprehensive analysis to elucidate the origin and relationship between RTs and the associated CRISPR-Cas system, and found that cases of RTs related to Cas1 loci were more prevalent in bacteria (11 clades) than archaea (only 1 clade) (Fig. 3B). Since Cas1 proteins mediate spacer acquisition together with Cas2 subunits in a heterohexameric complex, it has been suggested that an RT domain would be required for direct spacer acquisition from RNA (69). RT-Cas1 fusion proteins were also identified in type VI-A CRISPR-Cas systems (74). In this study, two variant type VI-A systems were assumed to be evolved independently to fuse Cas1 proteins with RTs possibly derived from type III-A and III-D systems. Moreover, in addition to the RT domain fused with Cas1, in many variants the Cas6 endoribonuclease domains are also N-terminally fused to RT-Cas1 parts (Fig. 3B) (13, 73, 75, 76). The role of Cas6 protein has been well established as an effector protein directly required for processing pre-crRNA (77), while Mohr et al. proved that the Cas6 domain fused with RT-Cas1 is also involved in RNA spacer acquisition (76).
Determining the structure of a protein complex provides a detailed understanding of its molecular mechanism. To this end, Wang et al. recently revealed the entire cryo-EM structure of type III Cas6-RT-Cas1-Cas2 complex in a naturally occurring Thiomicrospira (Thio) species (Fig. 2C) (78). Despite the authors’ effort to characterize the DNA-bound complex, which combines purified Cas6-RT-Cas1-Cas2 proteins together with a DNA substrate designed to resemble a half-site integration intermediate, only the DNA-unbound structure (apo–Cas6-RT-Cas1-Cas2 complex) was solved. This structure of Thio apo–Cas6-RT-Cas1-Cas2 complex is heterohexameric, and consists of two distal Cas6-RT-Cas1 dimers and a central Cas2 dimer, like typical Cas1-Cas2 integrases (Fig. 2A-C) (48, 79-81). Thio Cas6–RT-Cas1-Cas2 complexes are distinguishable from those of E. coli Cas1-Cas2 in some respects (or aspects of their structure). For example, in the E. coli Cas1-Cas2 complex, two positively charged regions in the DNA binding cleft and Cas2 dimer are critical for prespacer binding, enabling the intrinsic ruler mechanism (47, 52). Although similar charged regions are located on the Cas1 domains and Cas2 subunits in the Cas6–RT-Cas1-Cas2 complex, Cas2 dimers and one Cas1 dimer are rotated further away from another Cas1 dimer, resulting in an altered dimer interface. The authors also reported that three active sites of Cas1-Cas2 integrase, RT, and Cas6 maturase are in close contact, implying that functional crosstalk is tightly coordinated (Fig. 2C). Furthermore, the RT domain of Cas6-RT-Cas1 resembles other RTs, such as retroviral and group II intron RTs (Fig. 2D) (82, 83), implying that the integration process by Cas6-RT-Cas1-Cas2 could be followed by the target-primed reverse transcription, similar to the retro-homing mechanism of group II introns.
In an alternative approach, Mohr et al. revealed a truncated mutant structure of the Cas6 maturase domain of Marinomonas mediterranea (MMB-1) type III-B Cas6–RT-Cas1-Cas2 (Fig. 2E) (84). The structure of the Cas6 from MMB-1 was superimposed onto that from Thio, showing overall good alignment, except for differences in the β-strand architecture of the C-terminal RRM fold. In addition, the overall architecture of Cas6 domains in Cas6-RT-Cas1 proteins looks quite distinct from other stand-alone Cas6 proteins (84-86). These phenomena suggest that the Cas6 domain in Cas6-RT-Cas1-Cas2 complex is both critical to crRNA processing, and may be functionally engaged in either RNA substrate capture and process during reverse transcription by RT domain, or prespacer integration cooperating with Cas1-Cas2 integrase in the complex.
Domain-fused proteins, including RT-fused Cas1-2 integrases, typically coordinate their series of actions. It has been suggested that RNA could be a suitable substrate for spacer acquisition by Cas1-Cas2 integrase, with assistance from the fused RT. In vivo integration assay using RNA transcripts harboring self-splicing introns have shown that RNA transcripts can be integrated into CRISPR arrays by the MMB-1 Cas6-RT-Cas1-Cas2 complex (87). This result was reproduced using the Cas6-RT-Cas1-Cas2 of Fusicatenibacter saccharivorans (F. sac) (88), and Vibrio vulnificus (V. vul) (89). MMB-1 and Thio Cas6-RT-Cas1-Cas2 complexes show substrate versatility, as they have been shown to integrate dsDNA, ssDNA, and ssRNA oligonucleotides into the CRISPR DNA (78, 87). Full-site integration occurs only site-specifically when dsDNA substrates are provided. In MMB-1, both RT and Cas1-Cas2 integrase activities of the Cas6-RT-Cas1-Cas2 complex are indispensable for RNA integration, while RT activity is not required for DNA integration. In contrast, Thio Cas6-RT-Cas1 alone, regardless of the presence of Cas2 dimer, exhibits integration activity, although Cas2 improves integration efficiency. Thio Cas6-RT-Cas1-Cas2 is significantly more efficient at integrating dsDNA, ssDNA, and DNA/RNA hybrid substrates than ssRNA substrates. Collectively, these results suggest that the RT and Cas1 domains in the Cas6-RT-Cas1-Cas2 complex closely cooperate for the RT process and integration functions (78, 87). Furthermore, mutations in the Cas6 active site affect both integration and RT activity, while mutations in the Cas1 and RT domains show no effect on Cas6 RNA processing activity. This suggests a unidirectional crosstalk between the Cas6 domain and the other two domains in Thio Cas6-RT-Cas1-Cas2 (78).
Next, the sequence and length specificity of spacer acquisition by Cas6-RT-Cas1-Cas2 have also been closely examined. In MMB-1, the majority of spacers of 70-75% were 34-36 base pairs (bp) long, with no significant sequence specificity observed (69). Although a small preference towards the sense strand spacer was also observed in an RT-independent manner in MMB-1, this strand bias was not reconstructed in E. coli. In contrast, the median spacer length was 39 bp, with a distribution bias towards longer spacers in F. sac. (88). Strikingly, a strong bias towards AT-rich spacers was also observed. This apparent bias towards AT-rich spacers may be due to the AT-richness of RNA ends, but the bias persisted, even when considering only spacers derived from the gene body. Furthermore, the F. sac Cas6-RT-Cas1-Cas2 complex did not exhibit any preference for PAM-like sequence motifs, but most spacers were acquired from the areas proximal to start and stop codons. Also, spacers were preferentially acquired towards an antisense orientation. Lastly, in V. vul, most of the spacers were in the range 34-38 bp with no PAM-like preference (89). However, there was an antisense bias for coding sequence orientation with a significant GC bias. Unlike F. sac, there was no bias at the start and end codons. Taken together, the differential specificity for spacer length and sequence observed in previous independent studies implies significant variation among these systems in vivo, which needs to be characterized in future studies.
The Cas1-Cas2 adaptation complex can integrate foreign nucleic acids as spacers between the leader sequence and the first repeat sequence of its own CRISPR array (53, 55). Taking advantage of this nucleic acid induced CRISPR-Cas memory system, molecular recording using CRISPR-Cas systems has emerged as a prominent field of research, with the potential to revolutionize various areas of biotechnology and life sciences. Cas1-Cas2-based DNA recording offers several advantages over other explored methods for storing information (90-92). Several recently published articles have described diverse approaches to molecular recording using CRISPR-Cas systems, each with its own advantages and limitations (93). In 2016, Shipman et al. developed the first DNA recording system using the type I-E Cas1-Cas2 complex in E. coli (94). One of the key benefits of this recording system is that the Cas1-Cas2 integrase can target and capture specific DNA molecules as spacers, and integrate them into the CRISPR array directionally and temporally, reflecting dynamic changes in real time within a cell (95-98). Since the recorded DNA was stored in the CRISPR array in a time-ordered way, it is possible to reconstruct the recorded information temporally, and make a lineage of events by later sequencing the CRISPR array (95, 99). By creating a permanent record of events within the genome, researchers can gain valuable insights into the underlying mechanisms driving cellular behavior and response to environmental stimuli at a single-cell level (95, 100). Moreover, since the CRISPR array is inherent within the genome of the bacterial cell, it offers a reliable and long-term record that can easily be retrieved and analyzed, without the need for external devices or systems in the case of some prokaryotes. However, these molecular recording systems have several limitations that need to be addressed. For example, the systems integrate spacers derived from DNA, not RNA, and they do not co-ordinate the transcriptional dynamics of the cell. There is also a need to transform exogenous nucleic acids to verify the specificities and efficiencies of molecular recording (99).
Similarly, transcriptional molecular recorders are innovative tools that can capture various transcriptional events, and incorporate them into the genome of the cell. This approach has the potential to reveal the precise timing, order, and intensity of transcriptional activity at the cellular level, without requiring multiple destructive assays (Fig. 4A). To this end, Schmidt et al. utilized F. sac Cas6-RT-Cas1-Cas2 for transcriptional recording, instead of the DNA integrating E. coli Cas1-Cas2 complex (88), since Cas6-RT-Cas1-Cas2 has been used as a spacer integrase from cellular transcripts (Fig. 4B) (69). This study showed that the tracking of transcriptional responses to specific stimuli within bacterial cells can be achieved via transcriptional recording (88). Furthermore, transcriptional events stored as RNA-derived spacers at the CRISPR array reflected temporal and global transcriptomic memories to various stresses (99, 101). Another recent study by Schmidt et al. engineered multiplexed transcriptional recording to track transcriptional histories of the stress response in the physiological environment, enabling engineered microbiome as sentinel cells to record environmental changes in mouse guts by Record-seq (Fig. 4B) (101). Taken together, these studies support the idea that RT-Cas1-Cas2-based recording technology can be a useful tool for studying the evolution of cells within various environmental contexts, as well as for novel therapeutic development and diagnostics. Lastly, a retron RT, which can reverse transcribe non-coding RNA (ncRNA), has also been proposed to record temporal memories of transcriptional events in a live cell, and emerged as an alternative to transcriptional recording using RT-Cas1-Cas2 (102).
In the last decade, the CRISPR-based genome engineering field has been successfully revolutionized, along with our understanding of the mechanisms of CRISPR-Cas adaptive immune systems. DNA-cutting CRISPR-Cas effectors, such as type II Cas9 and type V Cas12a (also termed Cpf1), have been mainly focused on and engineered to develop diverse genome editing technologies (103). The additional need for advancing RNA editing tools has enabled RNA-targeting type VI CRISPR Cas13 effectors to be reconfigured for use in gene silencing (104), and RNA diagnostics (105). Beyond gene editing, effort is still being made to discover novel CRISPR-based technologies, such as CRISPR-mediated imaging, epigenome manipulation, and molecular recording tools. In this review, we focus on recent advances and understandings regarding molecular recording techniques based on CRISPR adaptation modules. Throughout recent genetic, structural, and biochemical approaches, we can now explain how proteins relate to the DNA CRISPR adaptation function to ensure efficiency and fidelity for precise spacer acquisition (8). However, our knowledge of the mechanism of CRISPR RNA adaptation is very limited.
To extend our knowledge and to improve the technical limitations of current RNA recording tools, further studies need to be addressed to better understand the basic mechanism of RT-Cas1-Cas2-mediated CRISPR adaptation. Outstanding questions include: How are the ssRNA or RNA/DNA or dsDNA bound RT-Cas1-Cas2 structures configured? How can the cooperation between the domains of RT-Cas1-Cas2 be achieved for CRISPR RNA acquisition, and which role of Cas6 is critical in this process? How does the RT domain in this complex reverse transcribe RNA substrates? How can the actual process from ssRNA capture to full site integration be coordinated? How is the suitable spacer size determined during RNA adaptation? And how does the orientation of integrated spacers ensure immunity? It is also not known whether RNA spacer acquisition can contribute to immunity against RNA bacteriophages and other foreign RNA elements, beside targeting nascent viral transcripts.
While these molecular recording systems using RT-Cas1-Cas2 have several promising advantages that warrant their future use, RT-Cas1-Cas2-based recorders still have inherent limitations that need to be addressed. It remains to be solved how these recorders can function the same way in the eukaryotic system. The process of implementing RT-Cas1-Cas2-based recorders into mammalian cells may depend on non-Cas bacterial factors (51, 106), or may be established by a combination with epigenome manipulating techniques in eukaryotic environments. Furthermore, the capacity and efficiency to detect CRISPR arrays, which are intended to integrate transcript-derived spacers of interest, could be very slim. Given that not all spacers of interest will be integrated as expected, this inefficient recording possibility may further reduce the chance of detecting a spacer of interest. Therefore, in future research, more optimized sequencing techniques need to be developed to overcome the detection limit.
This work was supported by IBS-R008-D1, Young Scientist Fellowship program of the Institute for Basic Science from the Ministry of Science and ICT of Korea.
The authors have no conflicting interests.