In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced.
In its earliest days, "gene finding" was based on painstaking experimentation on living cells and organisms. Statistical analysis of the rates of homologous recombination of several different genes could determine their order on a certain chromosome, and information from many such experiments could be combined to create a genetic map specifying the rough location of known genes relative to each other. Today, with comprehensive genome sequence and powerful computational resources at the disposal of the research community, gene finding has been redefined as a largely computational problem.
Determining that a sequence is functional should be distinguished from determining the function of the gene or its product. Predicting the function of a gene and confirming that the gene prediction is accurate still demands in vivo experimentation[1] through gene knockout and other assays, although frontiers of bioinformatics research [2] are making it increasingly possible to predict the function of a gene based on its sequence alone.
Gene prediction is one of the key steps in genome annotation, following sequence assembly, the filtering of non-coding regions and repeat masking.[3]
Gene prediction is closely related to the so-called 'target search problem' investigating how DNA-binding proteins (transcription factors) locate specific binding sites within the genome.[4][5] Many aspects of structural gene prediction are based on current understanding of underlying biochemical processes in the cell such as gene transcription, translation, protein–protein interactions and regulation processes, which are subject of active research in the various omics fields such as transcriptomics, proteomics, metabolomics, and more generally structural and functional genomics.
^Sleator RD (August 2010). "An overview of the current status of eukaryote gene prediction strategies". Gene. 461 (1–2): 1–4. doi:10.1016/j.gene.2010.04.008. PMID 20430068.
^Ejigu, Girum Fitihamlak; Jung, Jaehee (2020-09-18). "Review on the Computational Genome Annotation of Sequences Obtained by Next-Generation Sequencing". Biology. 9 (9): 295. doi:10.3390/biology9090295. ISSN 2079-7737. PMC 7565776. PMID 32962098.
^Yandell M, Ence D (April 2012). "A beginner's guide to eukaryotic genome annotation". Nature Reviews. Genetics. 13 (5): 329–42. doi:10.1038/nrg3174. PMID 22510764. S2CID 3352427.
^Redding S, Greene EC (May 2013). "How do proteins locate specific targets in DNA?". Chemical Physics Letters. 570: 1–11. Bibcode:2013CPL...570....1R. doi:10.1016/j.cplett.2013.03.035. PMC 3810971. PMID 24187380.
^Sokolov IM, Metzler R, Pant K, Williams MC (August 2005). "Target search of N sliding proteins on a DNA". Biophysical Journal. 89 (2): 895–902. Bibcode:2005BpJ....89..895S. doi:10.1529/biophysj.104.057612. PMC 1366639. PMID 15908574.
geneprediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes...
GeneMark is a generic name for a family of ab initio geneprediction algorithms and software programs developed at the Georgia Institute of Technology...
of software tools and web portals used for geneprediction. Geneprediction List of RNA structure prediction software Comparison of software for molecular...
Geneprediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes...
list of RNA structure prediction software is a compilation of software tools and web portals used for RNA structure prediction. The single sequence methods...
alignment, gene finding, genome assembly, drug design, drug discovery, protein structure alignment, protein structure prediction, prediction of gene expression...
function prediction Protein–protein interaction predictionGeneprediction Protein structure prediction software De novo protein structure prediction Molecular...
Gene pool Gene redundancy Gene silencing Genetic algorithm Haplotype List of geneprediction software Lists of human genes Predictive medicine Quantitative...
applied to prediction of gene function. Before the use of phylogenomic techniques, predicting gene function was done primarily by comparing the gene sequence...
assemblers List of geneprediction software List of disorder prediction software List of Protein subcellular localization prediction tools List of phylogenetics...
annotated as genes in genome sequences. Processed pseudogenes often pose a problem for geneprediction programs, often being misidentified as real genes or exons...
annotation is geneprediction, which is why numerous methods have been developed for this purpose. Geneprediction is a misleading term, as most gene predictors...
Samuel Karlin at Stanford University. In 2001, the world of human geneprediction entered into Comparative genomics. This resulted in the development...
geneprediction, the number of true negatives (non-genes) in genomic sequences is generally unknown and much larger than the actual number of genes (true...
determine the location of protein-encoding genes within a given DNA sequence (i.e. geneprediction). Geneprediction is commonly performed through both extrinsic...
These predictions are often driven by data-intensive computational procedures. Information may come from nucleic acid sequence homology, gene expression...
proteomic information, often derived from mass spectrometry, to improve gene annotations. The utilization of both proteomics and genomics data alongside...
cultivated clonal cultures, early environmental gene sequencing cloned specific genes (often the 16S rRNA gene) to produce a profile of diversity in a natural...
A prediction (Latin præ-, "before," and dictum, "something said") or forecast is a statement about a future event or about future data. Predictions are...
actually translated to protein. CDS prediction is a subset of geneprediction, the latter also including prediction of DNA sequences that code not only...
ABC transporter superfamily, identified by EST database mining and geneprediction program, is highly expressed in breast cancer". Molecular Medicine...
Glimmer may refer to: GLIMMER, a geneprediction software package Glimmer, a 1999 album by Sundown The Glimmer Twins, a pseudonym used by Mick Jagger...
the gene-sets of the WormBase species were initially generated by geneprediction programs. Geneprediction programs give a reasonable set of gene structures...
venom glands. Further studies on geneprediction and annotation of the Indian cobra genome identified 139 toxin genes from 33 protein families. These included...
annotation their predictions of the number of genes on each chromosome varies (for technical details, see geneprediction). Among various projects, CCDS takes...
Hypothetical proteins are created by geneprediction software during genome analysis. When the bioinformatic tool used for the gene identification finds a large...
CCDS's gene number prediction represents a lower bound on the total number of human protein-coding genes. The following is a partial list of genes on human...