The mechanism of the CRISPR system has been identified as an antiviral immune response system in bacteria and archaea (1, 2). In the CRISPR interference step, CRISPR-Cas effectors operating as a guide RNA-based endonuclease induce double strand breaks in the target DNA sequence to remove or suppress foreign invading genes (3). When CRISPR effectors, reprogrammed for target genes, are used for gene editing in cellular conditions, genetic information can be changed by inducing a double strand break on target DNA (4, 5). Owing to these characteristics, CRISPR has recently been widely applied in various fields such as generation of transgenic animals or plants (6), characterization of genes, and development of therapeutic biomaterials (7, 8). Although the CRISPR system is currently one of the powerful gene editing tools, the CRISPR effectors recognize targets based on guide RNA; consequently, unintended off-target mutations in target-like off-target sequences remain a common challenge (9-14). In this review, the cause of off-targeting is explained in terms of the CRISPR structure, and previous studies that aimed to solve the off-targeting challenges are summarized.
Numerous structural and biochemical studies have suggested different mechanisms by which the CRISPR system recognizes and binds to a target sequence to sequentially induce DNA cleavage (15-18). Among the mechanisms, the CRISPR-Cas9 effector is known as a typical mono-unit endonuclease belonging to Class II/type II clade (19), and the target recognition and cleavage mechanisms were first identified by structural studies of the CRISPR-Cas9-guide RNA-target DNA ternary complex (Fig. 1A) (16, 18). Considering the structure of Cas9, direct interaction with the PAM sequence (NGG) is performed using the PI domain of the Cas9 effector (Fig. 1A, Right inset). After PAM recognition, the interaction between a specific amino acid residue and the DNA backbone forms a phosphate lock loop and kinks the base immediately following the PAM base sequence (17, 18). Then, the guide RNA forms a complementary base-pair with the TS of the target DNA to form an RNA-DNA heteroduplex and an R-loop structure (Fig. 1A, Left inset) (18). In the heteroduplex structure, the PAM-proximal region close to PAM is known as a seed region that is very sensitive to mismatch formation between guide RNA and target DNA (13, 20, 21). In general, unintended mutations of the CRISPR-Cas9 effector are generated by mismatch formation in the PAM-distal region and permissive cleavage. On the other hand, the CRISPR-Cas12 system is classified as Class II/type V class, and the endonuclease functions of various orthologs with different characteristics from Cas9 have been identified (22, 23). Among them, the CRISPR-Cas12a effector is a type V prototype and has been reported as a CRISPR-Cas effector capable of recognizing thymine-rich PAM, unlike Cas9 (24). In terms of the ternary structure, Cas12a recognizes PAM by direct base contact and structure-based recognition of the entire PAM region, which is different from the Cas9 pattern (Fig. 1B, Left inset) (25-27). However, guide RNA induces heteroduplex by forming a complementary base-pair with TS of target DNA in an almost similar way (Fig. 1B, Right inset). According to the currently reported results, Cas12a is known to have a seed region that is more susceptible to formation of a mismatch between the guide RNA and the target strand DNA of the PAM-proximal region (28-30). The Cas12a effector is tolerant only to mismatch formation in regions far from the PAM, and relatively few off-target mutations are detected compared to Cas9. In addition, effectors such as Cas12b (31-33), Cas12f (34-36), and Cas12m (37), for which guide RNA-based target recognition was recently identified, also show the same target recognition properties as the Cas12a effector. This shows that the Cas12 family can be more advantageous than the Cas9 family for the accuracy of target DNA recognition during genome editing in various living organisms.
Single-molecule-level studies have explained how CRISPR effectors search for target sequences within genes and the mechanism of R-loop formation and cleavage following target binding (38, 39). In the CRISPR-Cas9 system, the Cas9 effector is attached to the PAM sequence (dual Guanines) through a searching process at the 1d-3d level and induces stable binding at the target sequence complementary to the guide RNA (Fig. 1C, Top) (17, 40). When a Cas9 structure that is favorable to DNA cleavage is formed, a double strand break is induced on the target DNA using two cleavage domains (RuvC and HNH domains). In the CRISPR-Cas12 system, unlike the Cas9, after the PAM search and target sequence binding process on the target gene, NTS and TS are sequentially cleaved using one cleavage domain (RuvC) (Fig. 1C, Bottom) (30). The difference in the target DNA cleavage method indicates that the Cas12 and Cas9 effectors form different types of structures during the cleavage of the nucleotide sequence in the various off-target sites (41); thus, the off-target effects can be very different. In other words, in a mismatch formation within the guide RNA-target DNA heteroduplex, each gene may show a different off-target cleavage profile according to the different structures favorable to cleavage induction.
Since CRISPR effectors recognizes targets based on guide RNA, mutations in target-like off-target sequences are common. To minimize off-target instances, two methodologies can be applied. The first method involves designing an optimal guide RNA that minimizes off-target mutations in vivo by using a database-based in-silico method. The second method involves constructing a universal CRISPR effector that operates precisely on a target sequence. This chapter introduces these methodologies.
Minimizing off-target effects with in-silico based prediction: A method using the in-silico method to design and select guide RNAs to minimize off-targeting was the first to be studied among methods for inducing precise gene editing for various endogenous targets (Fig. 2A) (42, 43). This method investigates all the possible off-target candidates similar to target sequences in the whole genome based on open database sources and suggests the best-fit guide RNA with no off-target candidates by comparing the degree of similarity with the genuine target sequence. In addition to this prediction approach, a method for verifying the selected optimal guide RNA candidates with high accuracy at the actual cell level has been reported (44).
Development of the off-target detection methodologies in living systems: In addition to systems that use databases to enumerate off-target candidates, methodologies that experimentally detect actual off-targets are gradually developing. They include GUIDE-seq (45), BLISS (46), BLESS (47), SITE-seq (48), IDLV (49), CasKAS (50), GUIDE-tag (51), HTGTS (52), and Discover-seq (53). The methods can perform whole genome off-target analysis in an unbiased manner to find the specific guide RNA-based CRISPR effectors targeting the actual intracellular genome (Fig. 2B, Top). On the other hand, technologies such as CIRCLE-seq (54), Dignome-seq (55), SITE-seq (48), and Extru-seq (56), which extract and analyze genomic DNA outside cells, can detect off-target candidates with considerable accuracy (Fig. 2B, Bottom).
Protein engineering: Due to the unique characteristics of CRISPR’s components, most CRISPR-based systems are susceptible to off-target activities. There have been efforts to minimize off-target challenges and improve target specificity of the classical CRISPR-Cas9 including modifying PAM to extend targeting range (e.g., xCas9 (3.7) (57) and SpCas9-NG (58), and introducing substitutions that limit off-target interactions (e.g., SpCas9-HF1 (59), eSpCas9 (60), HeF-SpCas9 (61), EvoSpCas9 (62), and HypaCas9 (63). When engineering was introduced into these CRISPR-Cas proteins, mutants that induce weak binding to off-target DNA were modified by changing the amino acid residues in direct contact with the target DNA into neutrally charged forms (Fig. 3A) (59-61) or by introducing random mutation (62). When target specificity is improved through protein engineering, target specificity tends to increase in a wide range for various targets. In the case of mutants that induce weak binding to the whole target DNA, the part where the on-target efficiency is also weakened remains a challenge to solve in the future. In addition, it is pointed out as a disadvantage that requires optimization through specialized engineering according to each CRISPR effector.
Guide RNA engineering: A guide RNA can effectively control the binding of target DNA. The CRISPR system can bind on- and off-target DNA through a stabilized heteroduplex formation between guide RNA and target DNA (64). Recently, there have been reports of enhancing the target specificity of the CRISPR-Cas9 (65) and CRISPR-Cas12 (66) effectors by inducing a change in the hybridization energy formed within the heteroduplex (Fig. 3B). Previous studies suggested a mechanism that is sensitive to mismatch formation caused by off-target binding in which hybridization energy was changed by introducing partially substituted DNA to guides composed of complete RNA. Notably, the genome editing efficiency does not decrease and tends to be normally induced for the on-target that induces a perfect match between the guide with optimized DNA substitution and the target DNA. In addition to DNA substitution for guide RNA, many types of guide RNA engineering have been attempted to improve target specificity for various cellular systems (67-71). The advantage of engineering guide RNA compared to protein engineering is that it is easy to screen effective guides. However, such engineering has the disadvantage of having to be produced through expensive synthesis depending on the CRISPR effector.
Enhanced specificity with prime editing: Prime editing, the latest tool in the arsenal of CRISPR-based genome engineering technology, has enabled targeted and versatile gene editing in various living organisms (72-80). It can facilitate numerous modifications (insertions, deletions, substitutions) of the DNA based on the reverse transcriptase activity without deleterious DNA double stranded breaks. Since prime editing uses prime editing guide RNA (pegRNA) that is simultaneously annealed to the target strand and non-target strand for target recognition and RT operation, it is very sensitive to mismatches caused by off-target binding (Fig. 3C, Left) (81). The recent prime editing technology has been shown to achieve a target specificity that is relatively superior to that of inducing indels by DNA cleavage with the existing CRISPR system (82, 83). However, the disadvantage of the current prime editing technology is that, in vivo, the overall gene editing efficiency is low for various gene sites, and byproduct indels are generated when an advanced PE3 system is used to overcome low efficiency (Fig. 3C, Right) (84, 85). In the near future, if the editing efficiency of the prime editing technology itself is increased, sophisticated and diverse gene editing for living organisms may be possible.
Other CRISPR-Cas12 orthologs and OMEGA endonuclease which have superior accuracy for genome editing: As research on the CRISPR-Cas12 family is progressing, Cas12 orthologs of various forms and functions are being discovered (23, 86-90). Since these Cas12 orthologs are generally sensitive to mismatches during heteroduplex formation, it has been reported to have excellent specificity with little off-target activity. In addition, very small ωRNA-guided endonucleases with properties very similar to those of the Cas12 ortholog and containing only the core domain of the Cas12 family were discovered (Fig. 3D) (91, 92). TnpB and Fanzor endonuclease of the OMEGA family are components of IS200/IS605 transposons whose functions have been newly studied in prokaryotes or eukaryotes. Also, OMEGA endonucleases such as IscB, for which gene editing research has been actively conducted recently, also shows high target specificity and promising editing results for applicability in human systems (93). In the future, these emerging endonucleases are expected to be widely applied as gene editing tools due to their accuracy.
Since the mechanism of operation of the CRISPR-Cas9 system was identified, many types of CRISPR-based technologies have been developed and applied in biological systems. The off-target problem caused by the CRISPR effectors exposed in previous studies remains a challenge to overcome in the future. Fortunately, the new RNA-guided endonucleases that are excellent in target specificity, such as Cas12 and the OMEGA system, are continuously being discovered in de novo form and their DNA editing functions are identified. The off-target detection methods for these endonucleases, which are continuously being developed, together with the existing endonuclease engineering technology, provide a basis for the development of accurate genome editing technology in the future. These bioengineering methods will become a foundation for approaches that can safely and effectively induce genome editing.
This research was supported by grants from the National Research Foundation funded by the Korean Ministry of Education, Science and Technology (NRF-2022R1A2C4001609) and the Korean Fund for Regenerative Medicine (KFRM) grant funded by the Korea government (the Ministry of Science and ICT, the Ministry of Health & Welfare) (22A0203L1). This research was also supported by the Chung-Ang University Graduate Research Scholarship in 2023.
The authors have no conflicting interests.