
Owing to rapid advancements in NGS (next generation sequencing), genomic alteration is now considered an essential predictive biomarkers that impact the treatment decision in many cases of cancer. Among the various predictive biomarkers, tumor mutation burden (TMB) was identified by NGS and was considered to be useful in predicting a clinical response in cancer cases treated by immunotherapy. In this study, we directly compared the lab-developed-test (LDT) results by target sequencing panel, K-MASTER panel v3.0 and whole-exome sequencing (WES) to evaluate the concordance of TMB. As an initial step, the reference materials (n = 3) with known TMB status were used as an exploratory test. To validate and evaluate TMB, we used one hundred samples that were acquired from surgically resected tissues of non-small cell lung cancer (NSCLC) patients. The TMB of each sample was tested by using both LDT and WES methods, which extracted the DNA from samples at the same time. In addition, we evaluated the impact of capture region, which might lead to different values of TMB; the evaluation of capture region was based on the size of NGS and target sequencing panels. In this pilot study, TMB was evaluated by LDT and WES by using duplicated reference samples; the results of TMB showed high concordance rate (R2 = 0.887). This was also reflected in clinical samples (n = 100), which showed R2 of 0.71. The difference between the coding sequence ratio (3.49%) and the ratio of mutations (4.8%) indicated that the LDT panel identified a relatively higher number of mutations. It was feasible to calculate TMB with LDT panel, which can be useful in clinical practice. Furthermore, a customized approach must be developed for calculating TMB, which differs according to cancer types and specific clinical settings.
Next-generation sequencing (NGS) technique has undergone several advancements in recent times. In clinical practice, com-prehensive genome profiling is now done extensively with NGS technique because it has a short turnaround time and an acceptable cost of operation (1, 2). In particular, cancer patients are being extensively screened with NGS technique. The results are used to devise actionable genomic alteration, which is now an essential step in deciding the preferred mode of treatment (3). As the number of target genes increases, an optimized panel of contents and size have been constructed by targeting genes related to specific cancer types. This has been widely the conventional mode of NGS technique for the past several years (4, 5).
The initial treatment decision is usually made after performing NGS technique, which accurately detects following mutations: single nucleotide variation (SNV), insertion and deletions (INDEL), copy number variation (CNV), and fusion. Recently, an immune checkpoint inhibitor (ICI) that targets PD-1 and PD-L1 axis has become the standard mode of treatment for patients with different types of cancer (6, 7) including non-small cell lung cancer (NSCLC). Therefore, several efforts are being made to develop a predictive biomarker that measures genomic alter-ation, which is a daunting task in genomics. Presently, tumor mutation burden (TMB) is considered as a candidate biomarker because several studies have reported that tumors with a high mutation burden are more likely to respond to ICI treatment (8-10). Interestingly, this finding has also been observed in pa-tients with same types of cancer. In many retrospective studies of NSCLC patients, it has been reported that clinical outcomes of ICI treatment were better in patients with high mutation burden, which was determined by whole exome sequencing (11). Moreover, this finding was confirmed by evaluating TMB with target sequencing panel (12). Therefore, target sequencing panel can be effectively used in cancer treatment: it is not only a companion diagnostic test that detects oncogenic drivers for targeted therapy, but it is also used to determine microsatellite instability (MSI) and TMB for cancer immunotherapy (13, 14).
The TMB value indicates the total number of mutations in the analyzed genomic region, and it is reported that there are variations for each type of tumor (15). While assessing the value of TMB, oncologists usually count somatic mutations in the entire exonic region: all the mutations are counted by the whole exome sequencing (WES) method, regardless of whether they are synonymous or non-synonymous in nature (11, 16). In contrast, oncologists encounter several challenges while cal-culating TMB by target sequencing panel. Firstly, there are limited number of genes in the target sequencing panel, so representative value becomes an issue for a small sized panel. Therefore, scientists have suggested using a panel whose size is larger than at least 1 megabase pair (Mbp) (17, 18). Secondly, while calculating TMB, scientists use specific methods for evaluating mutation scoring and for defining cutoff levels. These parameters indicate deleterious and clinically significant variants, which are not standardized till date (19, 20). These uncertain parameters have generated a controversy in defining high versus low TMBs. Moreover, certain studies show that the difference between carcinomas is obvious, and no cutoff value can be used universally (21). Last but not the least, very few studies have illustrated the prospective clinical benefits of using TMB, which is calculated by target sequencing panel.
Since target sequencing test is a laboratory developed test (LDT) and used in clinical practice, it is necessary to validate its results with gold standard methods in advance. Based on the results, scientists can decide its clinical implementation. CancerSCANⓇ (Twist Biosciences, CA, USA), which is a next generation cancer gene panel, is considered as a pipeline that targets cancer related genes; its clinical efficacy is found to be high in target therapy, which is based on genomic alteration and is used to treat many types of cancer (22). More than 15,000 patients have undergone genome profiling through CancerSCANⓇ panel till date, and it has been used for analyz-ing various clinical specimens (23-25). Recently, we have developed TMB calculation algorithms by using CancerSCANⓇ. In this study, we analyzed the efficacy with which CancerSCAN (pTMB) could detect TMB in NSCLC samples, and we compared it with TMB calculated by WES method (wTMB).
As an initial step, we compared the outcomes of pTMB and wTMB by using reference samples, whose TMB value (n = 3) was already known. The average sequencing depth was above 750×, and the duplication rate was also stable at an average value of 16.3% (Supplementary Table 1). When the first set was sequenced and analyzed in CancerSCAN, the absolute number of variants identified from each sample were as follows: 8, 18, and 23. When this set was divided by the total target coding region 1.1 Mb, the TMB scores were found to be 6.9, 15.5, and 19.8 (Supplementary Table 2). Using three identical sam-ples, WES method identified the absolute number of variants as 280, 560, and 721, respectively in the three samples, and it processed wTMB as 8.43, 16.86, and 21.7, respectively, after being divided by the total target region of 33 Mb. The processed pTMB from the duplicated second set was 8.6, 19.8, and 22.4. Similarly, the processed wTMB from the duplicated second set was 8.1, 16.0, and 21.7, respectively. By com-paring the data from both the initial and the duplicated data-set, we found that the concordance rate between pTMB and wTMB was R2 = 0.887 (Fig. 1).
The clinical samples were obtained by surgically resecting NSCLC (n = 100) tissues; these samples were sequenced with CancerSCAN at a mean depth of 1228.6× (Supplementary Table 3). In all the specimens, the tumor purity value was found to be more than 30% in pathological laboratory. Moreover, when the tumor purity value was calculated using the actually produced sequencing data, it was found to be very high at an average value of more than 80%. On an average, the on-target sequencing coverage was found to be 68.9%. The WES method was conducted on the same specimens and matched with normal samples; the average coverage was 209.4× from tumor sample and 68.0× from normal sample. The pre-defined cutoff value for high TMB were as follows: 10 mut/Mb for wTMB and 16 mut/Mb for pTMB. Based on these values, clinical samples were categorized as either high TMB (TMB-h) or low TMB (TMB-l). The raw data was presented on the scatter plot, and it showed a positive correlation between samples (R2 = 0.71, Fig. 2). In terms of categorization, most of the samples (92.0%) showed no discrepancy between pTMB and wTMB. However, 8 samples (8.0%) were found to be as TMB-h by wTMB method and as TMB-l by pTMB method. Among the concor-dant samples (n = 92), TMB-h ratio was found to be 8.7% (8 out of 92 samples). We reviewed the 8 cases, which were underestimated by pTMB. In most cases, a relatively higher number of mutations were detected in the genes that were not included in CancerSCAN method (pTMB). As shown in Fig. 3B, representative genes were the ones marked in orange (described in detail later).
An additional analysis was conducted to determine the tumor purity pathologically, the histology subtypes, and the differentiation of tumor tissues. The concordance between pTMB and wTMB was accurately found to be more than 95.0%. The high concordance was observed regardless of the surgical stage (Supplementary Table 4).
By directly comparing the values of pTMB and wTMB, we found that the number of wTMB were slightly lower than that of pTMB (Fig. 2, R2 = 0.71). To evaluate the difference between pTMB and wTMB, we compared the ratio of the coding region by panel sequencing and whole exome sequencing (3.5%). In addition, we also calculated the ratio of variants identified by target sequencing and WES in the entire population (4.8%) (Fig. 3A). Based on this result, we inferred that the mutation detected with target sequencing was found to be relatively more than that detected with WES.
As a further step, we compared the number of mutations of each gene by using WES and target sequencing panel (Fig. 3B). We ranked the top 50 genes based on the number of mutations identified by WES. Although most of the genes included here were not related to cancer, the list included five genes that were also identified by target sequencing panel. Last but not the least, we determined the variant allele fre-quency (VAF) and the number of mutations found by both target sequencing panel and WES, which was based on the variant type (Fig. 3C). The patterns of mutation were generally found to be similar in both the platforms, which showed the highest frequency of missense mutation.
Finally, we investigated the expected CancerSCAN TMB value, which was determined by comparing WES results with CancerSCAN panel content (Supplementary Table 5). The tumor samples sequenced by WES were classified as TMB-high or TMB-low, and 10 mut/Mb was used as the cutoff value. By using the same method for comparing WES results with that of CancerSCAN panel content, the expected values of TMB-high and TMB-low classifications were found to be similar to the experimentally determined values.
Several evidences indicate that TMB is a predictive biomarker related to ICI in several types of cancer (3). Moreover, TMB is now considered to be a component of the treatment guidelines related to ICI in some types of solid cancer. However, there is no consensus on how to measure TMB.
Compared to WES, which is considered as the gold standard method for TMB calculation, target sequencing offers more benefits in terms of shorter turnaround time and cost-effective-ness. However, there are various technical differences among the two target sequencing platforms, including the number of genes, the coverage of sequence, the variant calling algorithms, etc; therefore, it is difficult to compare the efficacy of the two methods. In particular, each component directly impacts the number of variants identified from the sample, which conse-quently causes differences in the TMB values of the two methods. Last but not the least, different algorithms may be used to filter out the irrelevant variants associated with TMB calculation; these algorithms differ according to the characteristics of each target sequencing panel. Therefore, an extensive validation of LDT target sequencing panel must be conducted with the standard method. Hence, WES should be performed before clinically selecting patients, who are most likely to benefit from ICI.
In this study, we evaluated the efficacy of LDT target sequen-cing panel and CancerSCAN by detecting a sample with high mutation burden and comparing it with the WES results of surgically resected NSCLC (n = 100) samples. The CancerSCAN technique was applied at a clinical level to determine the genomic alterations of target therapy (26-33). In addition, CancerSCAN results showed that a definite number of mutations could be identified and processed as TMB for references. The CancerSCAN’s panel target was 1.1 Mb, and it identified the variant according to an annotation database, which presented ethnicity specific mutation without matching normal samples. Since this panel was initially designed for cancer patients, most of the genes included in the panel were cancer-related genes, such as the oncogene and tumor suppressor genes. In addition, the panel was also developed to identify the variants with low VAF. This implies that more mutation, which was missed with a coverage of 100× to 200×, was captured in this experiment. All these factors are considered as key elements, which consequently caused a discrepancy between pTMB and wTMB values, thereby compelling us to make a direct compa-rison with the standard method.
In this study, we observed a high correlation between pTMB and wTMB values (R2 = 0.71). For the high TMB values detected by panel sequencing and WES, the concordance was found to be 93.0% for the samples. For the analysis, we made several stepwise approaches and the assumption was made in advance. Firstly, several exploratory analyses were made to evaluate the potential bias. We tested the technical issue by using known TMB values of reference samples in advance. Secondly, we used higher cut-off values for the pTMB. The various cut-off values were proposed from a number of studies, which tested different cancer types with different methods (Supplementary Table 6) (15, 20, 34, 35). These findings indicate that it is challenging to define a standardized cut-off value for high TMB because of the unique components among variable indi-vidual panels. The same issue was faced during the fine-tuning of CancerSCAN, that is, the TMB based on HLA was adjusted after considering the characteristics of the patient (20). Hence, we set a relatively high cutoff value of 16 or more mutations per megabase, for the identification of TMB-h patients. Lastly, the variant filtering process was performed in seven steps, which were used for the calculation of TMB.
This study has some limitations. Our samples were acquired from surgically resected tissues, whose clinical response data related to ICI was limited. In addition, the samples used for this study were pre-selected as histologic, and their tumor purity was pathologically found to be more than 30%. There-fore, interpretation was difficult in samples of low purity. Al-though we considered factors that affected TMB calculation, such as sample type, cancer type, and sequencing technique, our study established that TMB of well curated LDT was based on pre-defined criteria, which can be used as an alternative to WES method.
As an exploratory approach, three types of TMB reference materials were provided by SeraCare (https://www.seracare.com/): one TMB-low sample, expected to have 7 mut/Mb, and two TMB-high sample, expected to have 20 and 26 mut/Mb, respectively. Theses reference samples were designed using human lung cancer cell line with minimum tumor requirement of at least 30%. Reference samples were processed as forma-lin-fixed paraffin embedded (FFPE) sample sectioned at a 10 um thickness. DNA were extracted from FFPE according to the manufacturer’s protocol and duplicated to evaluate the repro-ducibility.
As a validation process using clinical samples, the fresh samples obtained from NSCLC (n = 100) were used. Theses specimens were acquired from surgically resected fresh tissue which were deidentified and stored in the tissue-bank. The surgical stage of samples based on AJCC 8th TNM stage were IVA (n = 1), IIIB (n = 10), IIIA (n = 23), IIB (n = 29), IIA (n = 8), IB (n = 24), IA (n = 5). The samples are comprised of both squa-mous cell carcinoma (n = 30) and adenocarcinoma (n = 70).
For the TMB calculation, the laboratory development test (LDT) target sequencing panel, CancerSCANⓇ v3.0 (K-Master panel), and whole exome sequencing were conducted using same sample. The CancerSCANⓇ v3.0 is a hybrid capture panel (a customed panel of Twist Biosciences, Twist Biosciences, CA, USA) which targets an average of 800 sequencing coverage around 1.73 Mbp (22). CancerSCANⓇ v3.0 is designed to 407 exomes of genes as well as 3 introns of genes for fusion hot-spot and incorporate about 4,000 additional single-nucleotide polymorphism (SNPs) almost evenly located in chromosomes for copy number variation (CNV) purity correction and parti-cular regions for microsatellite instability (MSI) detection. The total area targeted by the panel is 1.73 Mbp, but only 1.1 Mbp, the coding region, was used for the TMB calculation. Whole exome captures were prepared using the Twist human core exome kits (Twist Biosciences). Sequencing was performed by Illumina NextSeq550 (Illumina Inc., CA, USA) after library pre-paration (Supplementary Table 7).
Sequence reads were mapped to the human genome (hg19) using the Burrows-Wheeler Aligner (26). Duplicate read removal was conducted using Picard tools (https://broadinstitute.github.io/picard/) and SAMtools (27) and the local alignment was optimized by The Genome Analysis Toolkit (28). For the de-tection of single nucleotide variants of CancerSCANⓇ, results of two types of variant callers (MuTect (29), and LoFreq (30) were used and integrated to increase sensitivity, particularly for the low VAF variants. Pindel (31) was used to detect indels. For the detection WES, MuTect and Pindel were used.
We used pre-defined step wise criteria to evaluate the TMB from CancerSCANⓇ (Table 1). All variants including both synony-mous and nonsynonymous variants in the coding region, exonic region in other word, were included. This strategy enables to overcome the limitation coming from the small number of panels. In addition, due to the reason that CancerSCANⓇ targets only tumor tissue without matched normal sample using a deep sequencing method targeting up to 800×, it was inevi-table that low variant allele frequency (VAF) variants or germ-line variants could be included to the result. To filter out the germline variant, we used public and in-house database (22). Based on the previous reports (19), TMB is typically defined as the number of coding mutations including base substitution, and insertion and deletion (indel) per Mega-base (Mb) of the genome examined. The number of genomic alterations after the filtering process were divided by the 33 Mb for WES derived results and 1.1 Mb for CancerSCANⓇ derived results.
For the determination of TMB high and low for the WES, we adopted cutoff value of TMB-high, 10 mutations per Mbps, used for the the US Food and Drug Administration approval of pembrolizumab for solid cancer. For the cut-off value of TMB high and low for the CancerSCAN, we set a higher cutoff (≥ 16 mut/Mbp ) based on the observation that described that the misclassification rate of TMB-High increased with decreasing coding sequence region and increased sequencing depth in panel sequencing (17, 19).
This study was supported by Bristol Myers Squibb. The bio-specimens of this study were provided by Samsung Medical Center BioBank (2019-0029).
This work was supported Bristol Myers Squibb.
C. Lee is a senior researcher at Geninus. NKD. Kim is a director at Geninus. WY. Park is a chief executive officer at Geninus Inc. No potential conflicts of interest were disclosed by the other authors.
Variant filtering steps for the TMB calculation using CancerSCAN v3.0
Steps | Category | Filter out criteria |
---|---|---|
1 | Consequence of variants | Non-coding region with splice site |
2 | Chromosomal location | Mitochondrial DNA |
3 | Variants allele frequency (VAF) | LowVAF < 0.05 or HighVAF > 0.4 |
4 | Supporting reads | Reads ≤ 4 |
5 | Clinical significance | Benign |
6 | Minor allele frequency | gnomAD ≥ 0.0001 or 1000G EAS, KRGDB, KOVA ≥ 0.001 |
7 | Strand bias between forward and reverse reads | P value ≥ 0.05 by Fisher’s exact test |
![]() |
![]() |