Description
Clustered Regularly Interspaced Short Palindromic Repeats, or in short, CRISPR is a relatively new
technology that enables geneticists and medical researchers to edit parts of the genome by removing,
adding, or altering parts of the DNA. Initially found in the genomes of prokaryotic organisms such as
bacteria and archaea, this technology can cure many illnesses such as blindness and cancer. A significant
issue for a practical application of CRISPR systems is accurately predicting the single guide RNA
(sgRNA) on-target efficacy and off-target sensitivity. While some methods classify these designs, most
algorithms are on separate data with different genes and cells. The lack of generalizability of these methods
hinders the use of this guide in clinical trials since, for each treatment, the process must be designed
with its unique dataset, which has its own problems. Here we are trying to solve the generalizability
of this problem and present general and targeted prediction models that will help researchers optimize
the design of sgRNAs with high sensitivity. First, we tackled the problem by leveraging Latent Profile
Analysis and Ensemble Learning techniques to combine previous algorithms. However, the results obtained
using these methods were not satisfactory since they had a considerable disagreement. Finally,
we proposed a novel attention-based model, which is compatible in terms of accuracy. However, our
method provides the advantage of generalizability, allowing the model to offer insightful estimates to
RNA on-target efficiency that can quickly learn to predict even in new genes or cells.