Gene ontology

From Wikipedia, the free encyclopedia
  (Redirected from Gene Ontology)
Jump to: navigation, search

Gene ontology, or GO, is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species.1 More specifically, the project aims to:

  1. Maintain and develop its controlled vocabulary of gene and gene product attributes;
  2. Annotate genes and gene products, and assimilate and disseminate annotation data;
  3. Provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis.

GO is part of a larger classification effort, the Open Biomedical Ontologies (OBO).2

GO terms and ontology

From a practical view, an ontology is a representation of something we know about. “Ontologies" consist of a representation of things that are detectable or directly observable, and the relationships between those things. There is no universal standard terminology in biology and related domains, and term usages may be specific to a species, research area or even a particular research group. This makes communication and sharing of data more difficult. The Gene Ontology project provides an ontology of defined terms representing gene product properties. The ontology covers three domains:

Each GO term within the ontology has a term name, which may be a word or string of words; a unique alphanumeric identifier; a definition with cited sources; and a namespace indicating the domain to which it belongs. Terms may also have synonyms, which are classed as being exactly equivalent to the term name, broader, narrower, or related; references to equivalent concepts in other databases; and comments on term meaning or usage. The GO ontology is structured as a directed acyclic graph, and each term has defined relationships to one or more other terms in the same domain, and sometimes to other domains. The GO vocabulary is designed to be species-neutral, and includes terms applicable to prokaryotes and eukaryotes, single and multicellular organisms.

GO is not static, and additions, corrections and alterations are suggested by, and solicited from, members of the research and annotation communities, as well as by those directly involved in the GO project. For example, an annotator may request a specific term to represent a metabolic pathway, or a section of the ontology may be revised with the help of community experts (e.g.3). Suggested edits are reviewed by the ontology editors, and implemented where appropriate.

The GO ontology file is freely available from the GO website4 in a number of formats, or can be accessed online using the GO browser AmiGO. The Gene Ontology project also provides downloadable mappings of its terms to other classification systems.

Example GO term

id:         GO:0000016
name:       lactase activity
namespace:  molecular_function
def:        "Catalysis of the reaction: lactose + H2O = D-glucose + D-galactose." [EC:3.2.1.108]
synonym:    "lactase-phlorizin hydrolase activity" BROAD [EC:3.2.1.108]
synonym:    "lactose galactohydrolase activity" EXACT [EC:3.2.1.108]
xref:       EC:3.2.1.108
xref:       MetaCyc:LACTASE-RXN
xref:       Reactome:20536
is_a:       GO:0004553 ! hydrolase activity, hydrolyzing O-glycosyl compounds

Data source:5

Annotation

Genome annotation is the practice of capturing data about a gene product, and GO annotations use terms from the GO ontology to do so. The members of the GO Consortium submit their annotation for integration and dissemination on the GO website, where they can be downloaded directly or viewed online using AmiGO. In addition to the gene product identifier and the relevant GO term, GO annotations have the following data:

  • The reference used to make the annotation (e.g. a journal article)
  • An evidence code denoting the type of evidence upon which the annotation is based
  • The date and the creator of the annotation

The evidence code comes from the Evidence Code Ontology, a controlled vocabulary of codes covering both manual and automated annotation methods. For example, Traceable Author Statement (TAS) means a curator has read a published scientific paper and the metadata for that annotation bears a citation to that paper; Inferred from Sequence Similarity (ISS) means a human curator has reviewed the output from a sequence similarity search and verified that it is biologically meaningful. Annotations from automated processes (for example, remapping annotations created using another annotation vocabulary) are given the code Inferred from Electronic Annotation (IEA). As of April 1st, 2010, over 98% of all GO annotations were inferred computationally, not by curators.6 As these annotations are not checked by a human, the GO Consortium considers them to be less reliable and includes only a subset in the data available online in AmiGO. Full annotation data sets can be downloaded from the GO website. To support the development of annotation, the GO Consortium provides study camps and mentors to new groups of developers.

Example annotation

Gene product:    Actin, alpha cardiac muscle 1, UniProtKB:P68032
GO term:         heart contraction ; GO:0060047 (biological process)
Evidence code:   Inferred from Mutant Phenotype (IMP)
Reference:       PMID 17611253
Assigned by:     UniProtKB, June 6, 2008

Data source:7

Tools

There are a large number of tools available8 both online and to download that use the data provided by the GO project. The vast majority of these come from third parties; the GO Consortium develops and supports two tools, AmiGO and OBO-Edit.

AmiGO9 is a web-based application that allows users to query, browse and visualize ontologies and gene product annotation data. In addition, it also has a BLAST tool,10 tools allowing analysis of larger data sets,1112 and an interface to query the GO database directly.13

AmiGO can be used online at the GO website to access the data provided by the GO Consortium, or can be downloaded and installed for local use on any database employing the GO database schema (e.g. 14). It is free open source software and is available as part of the go-dev software distribution.15

OBO-Edit16 is an open source, platform-independent ontology editor developed and maintained by the Gene Ontology Consortium. It is implemented in Java, and uses a graph-oriented approach to display and edit ontologies. OBO-Edit includes a comprehensive search and filter interface, with the option to render subsets of terms to make them visually distinct; the user interface can also be customized according to user preferences. OBO-Edit also has a reasoner that can infer links that have not been explicitly stated, based on existing relationships and their properties. Although it was developed for biomedical ontologies, OBO-Edit can be used to view, search and edit any ontology. It is freely available to download.15

GO Consortium

The GO Consortium is the set of biological databases and research groups actively involved in the GO project.17 This includes a number of model organism databases and multi-species protein databases, software development groups, and a dedicated editorial office.

History

Gene ontology was originally constructed in 1998 by a consortium of researchers studying the genome of three model organisms: Drosophila melanogaster (fruit fly), Mus musculus (mouse), and Saccharomyces cerevisiae (brewer's or baker's yeast).18 Many other model organism databases have joined the Gene Ontology consortium, contributing not only annotation data, but also contributing to the development of the ontologies and tools to view and apply the data. Until now, most of major databases in plant, animal and microorganism make a contribution towards this project. As of January 2008, GO contains over 24,500 terms applicable to a wide variety of biological organisms. There is a significant body of literature on the development and use of GO, and it has become a standard tool in the bioinformatics arsenal. Their objectives have three aspects: building gene ontology, assigning ontology to gene/gene products and developing software and databases for the first two objects.

See also

References

  1. ^ The Gene Ontology Consortium (January 2008). "The Gene Ontology project in 2008". Nucleic Acids Res. 36 (Database issue): D440–4. doi:10.1093/nar/gkm883. PMC 2238979. PMID 17984083. 
  2. ^ Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S (November 2007). "The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration". Nat. Biotechnol. 25 (11): 1251–5. doi:10.1038/nbt1346. PMC 2814061. PMID 17989687. 
  3. ^ Diehl AD, Lee JA, Scheuermann RH, Blake JA (April 2007). "Ontology development for biological systems: immunology". Bioinformatics 23 (7): 913–5. doi:10.1093/bioinformatics/btm029. PMID 17267433. 
  4. ^ "Gene Ontology Database". Gene Ontology Consortium. 
  5. ^ The GO Consortium (2009-03-16). "gene_ontology.1_2.obo" (OBO 1.2 flat file). Retrieved 2009-03-16. 
  6. ^ "The what, where, how and why of gene ontology—a primer for bioinformaticians — Brief Bioinform". doi:10.1093/bib/bbr002. 
  7. ^ The GO Consortium (2009-03-16). "AmiGO: P68032 Associations". Retrieved 2009-03-16. 
  8. ^ Mosquera JL, Sánchez-Pla A (July 2008). "SerbGO: searching for the best GO tool". Nucleic Acids Res. 36 (Web Server issue): W368–71. doi:10.1093/nar/gkn256. PMC 2447766. PMID 18480123. 
  9. ^ Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S; AmiGO Hub; Web Presence Working Group (2008). "AmiGO: Online access to ontology and annotation data". Bioinformatics 25 (2): 288–289. doi:10.1093/bioinformatics/btn615. PMC 2639003. PMID 19033274. 
  10. ^ AmiGO BLAST tool
  11. ^ AmiGO Term Enrichment tool; finds significant shared GO terms in an annotation set
  12. ^ AmiGO Slimmer; maps granular annotations up to high-level terms
  13. ^ GOOSE, GO Online SQL Environment; allows direct SQL querying of the GO database
  14. ^ The Plant Ontology Consortium (2009-03-16). "Plant Ontology Consortium". Retrieved 2009-03-16. 
  15. ^ a b "Gene Ontology downloads at SourceForge". Retrieved 2009-03-16. 
  16. ^ Day-Richter, J.; Harris, M. A.; Haendel, M.; Lewis, S. (2007). "OBO-Edit an ontology editor for biologists". Bioinformatics 23 (16): 2198–2200. doi:10.1093/bioinformatics/btm112. PMID 17545183. 
  17. ^ "The GO Consortium". Retrieved 2009-03-16. 
  18. ^ Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (May 2000). "Gene ontology: tool for the unification of biology. The Gene Ontology Consortium". Nat. Genet. 25 (1): 25–9. doi:10.1038/75556. PMC 3037419. PMID 10802651. 

External links

  • Gene Ontology Consortium — Provides access to the ontologies, software tools, annotated gene product lists, and reference documents describing the GO and its uses.
  • GO Term Enrichment is a common statistical method used to identify shared associations between proteins and annotations to GO.
  • GONUTS Wiki — third party GO term documentation, including links to GO annotations at many major model organism databases.
  • GOCat — Automatic GO Categorizer/Browser to help Functional Annotation of Biomedical Texts; useful to functionally characterize protein and gene names lists generated by high-throughput experiments
  • Protein Ontology Project — Protein Ontology (PO), reference documents describing the PO and its uses.
  • EAGLi — a terminology-powered (Gene Ontology, Swiss-Prot keywords, etc.) biomedical question answering engine for MEDLINE
  • PubOnto — Medline Exploration based on Gene Ontology and other ontologies.
  • WikiProfessional — disambiguation, knowledge generation and collaborative intelligence.
  • SimCT — web-based tool to display relationships between biological objects annotated to an ontology, in the form of a clustering tree.
  • SerbGO — a GO tool to compare the capabilities of different programs to show their common features and their differences and to find which tools, if any, have some specific user-required capabilities for GO analysis.
  • Domain-centric Gene Ontology — database of domain-centric ontologies on functions, phenotypes, diseases and more.
  • GO2PUB — query PubMed with semantic expansion of gene ontology terms