DeepGenePrior: A Deep Learning Model for Prioritizing Genes Affected by Copy Number Variants
The genetic etiology of neurodevelopmental disorders is highly heterogeneous. They are characterized by abnormalities in the development of the central nervous system, which lead to diminished physical or intellectual capabilities. Determining which gene is the driver of disease (not just a passenger), termed ‘gene prioritization,’ is not entirely known. In terms of disease-gene associations, genome-wide explorations are still underdeveloped due to the reliance on previous discoveries when spotting new genes and other evidence sources with false positive or false negative relations. This paper introduces DeepGenePrior, a model based on deep neural networks that prioritizes candidate genes in Copy Number Variant (CNV) mediated diseases. Based on the well-studied Variational AutoEncoder (VAE), we developed a score to measure the impact of the genes on the target diseases.
Unlike other methods that use prior data on gene-disease associations to prioritize candidate genes (using the guilt by association principle), the current study exclusively relies on copy number variants. Therefore, the procedure can identify disease-associated genes regardless of prior knowledge or auxiliary data sources. We identified genes that distinguish cases from disorders (autism, schizophrenia, and developmental delay). A 12% increase in fold enrichment was observed in brain-expressed genes compared to previous studies, while 15% more fold enrichment was found in genes associated with mouse nervous system phenotypes. We also explored sex dimorphism for the disorders and discovered genes that overexpress more in one gender than the other. Additionally, we investigated the gene ontology of the putative genes with WebGestalt and the associations between the causative genes and the other phenotypes in the DECIPHER dataset. Furthermore, some genes were jointly present in the top genes associated with the three disorders in this study (i.e., autism spectrum disorder, schizophrenia, and developmental delay); namely, deletions in ZDHHC8, DGCR5, and CATG00000022283 were common between them. These findings suggest the common etiology of these clinically distinct conditions.
With DeepGenePrior, we address the obstacles in existing gene prioritization studies. This study identified promising candidate genes without prior knowledge of diseases or phenotypes using deep learning.