Bioinformatics
GSLIS
672
Winter 2005
paper by: Tomasz Neugebauer
Knowledge Representation and Ontologies in
Bioinformatics
Introduction
Efficient and reliable knowledge representation is necessary to apply prior knowledge to unknown entities. This paper is a review of knowledge representation in bioinformatics through the use of ontologies. The first part of the paper presents the context of the need for vocabulary control and ontologies. Some theoretical foundations of ontologies with examples in the context of bioinformatics are presented. A knowledge base is defined as “the combination of an ontology with associated instances” (Stevens et al 2000, 401). I conclude with the challenge of interoperability among biological knowledge bases and how ontologies are playing a part in meeting that challenge.
In addition, a table that summarizes the properties of some of the example ontologies and controlled vocabularies discussed in the paper, a glossary of terms, and a table demonstrating the syntax of the TAMBIS ontology are included.
Growth of Biological Information and the Need for Vocabulary Control
There is a recognized need to deal with the multitude of heterogenous and autonomous data sources that cover genomic, cellular, structure, phenotype, raw and analyzed molecular data, proteomic, and more (Fenstermacher 2005, 444; Schulze-Kremer 2002). The quantity of new genomic sequence data and interaction information is too large and complex to be implicitly understood by biologists without the use of information technologies. Baker et al. point out that “not only is the rate of data acquisition growing exponentially, but also a single experiment can collect data on a huge range of molecules [e.g.: 10 000 different mRNA species] that would need an army of experts to be interpreted.” (Baker et al 1999, 510)
Many argue that biology is a knowledge-based discipline, as opposed an axiomatic one. (Baker et al. 1999, 510; Stevens et al. 2002, 135) Novel hypotheses and interpretations of results in biology are made “by comparing data in hand against existing knowledge” (Baker et al. 510). For example:
· Hypotheses and prediction of protein function, structure and interactions from a sequence and vice versa are made by comparing to previously characterized proteins.
· Hypotheses and predictions of gene functions and attributes can be generated based on comparison to previously characterized genes.
The types of similarities found determine the type of conclusions and inferences that are made. However, in order find these similarities and new relationships we have to ensure that we are comparing apples to apples and oranges to oranges; otherwise the results will be meaningless. We cannot make valid predictions and inferences if we ignore the fact that in one database ‘apple’ includes the stem and leaves, whereas in the other it does not.
Schulze-Kremer give the following example of the concept ‘gene’ which can be defined either as a “‘DNA fragment that can be transcribed and translated into a protein’” (2002, 1) or as a “’DNA region of biological interest with a name and that carries a genetic trait or phenotype’ which includes nonstructural coding DNA regions like intron, promoter and enhancer” (Schulze-Kremer 2002) Schulze-Kremer claims that the former definition is used by GDB Human Genome Database and the latter by Genbank and Genome Sequence Database (GSDB) (Schulze-Kremer 2002). A single word can have many referents and ambiguity is an inevitable aspect of human language. Since bioinformatics relies on information technologies, database structures and query terms, ambiguities need to be formalized, resolution methods established and supporting technologies built so that the user can communicate with underlying information systems with authority and vocabulary control. Kazic points out that the “penalty for unreliable definitions is the proliferation of scientifically meaningless (or worse, misleading) results, which without additional inspection are indistinguishable from meaningful ones” (Kazic 2000, 1130)
The need for a
common nomenclature “in human genetics were recognised as early as the 1960s
and in 1979 full guidelines for human gene nomenclature were presented at the
Edinburgh Human Genome Meeting (HGM).” [1] The GDB Human Genome Database claims to use
HUGO Nomenclature, established and maintained by the HUGO Gene Nomenclature Committee (HGNC). The Human Genome Organisation (HUGO) was
established in 1989, it is the international organization of scientists
involved in human genetics [2] . HGNC approves gene names and symbols and
stores them in the Human Gene Nomenclature Database (Wain
2002). The guidelines for human
gene nomenclature (Hester et al 2002)
define a gene as "a DNA segment that contributes
to phenotype/function. In the absence of demonstrated function a gene may be
characterized by sequence, transcription or homology." The Human Gene Nomenclature Database contains
over 16 000 records, and allows for retrieval of gene symbol, gene name,
cytogenetic location, OMIM number and PubMed ID (Wain
2002).
In scientific communication that is unmediated by databases, human experts resolve the various semantic ambiguities implicitly through exchange of background knowledge and context. Human-computer and computer-computer communication cannot rely on an informal, implicit communication. Bioinformatics requires a multidisciplinary effort on the part of database designers, programmers, molecular biologists, pharmacologists, geneticists, computer scientists, biochemists, and more. The resultant database schemas and systems should be transparently referenced and collaboratively developed, preferably with tools that are capable of formal analysis and monitoring of the evolving system. There is a distinction to be made “between the semantics of nomenclature and the syntactic need for a standard notation to represent gene function in machine-computable ways” (MacMullen et al 2005, 452). The development of ontologies that formalize concepts and the relationships between them require an underlying semantic for the nomenclature as well as the development of notations and systems used to express the semantic programmatically.
Ontologies – Theoretical Foundations and Examples in Bioinformatics
Aristotle’s Categories specify the first well known ontology, “Expressions which are in no way composite signify substance, quantity, quality, relation, place, time, position, state, action, or affection” [3]. These are fundamental categories that could sit as the foundation of a further subdivision, eventually reaching molecular biology. (Schulze-Kremer 2002, p3). The field of biology has many useful taxonomies [4] which are limited to representing a hierarchy of subsumption. In comparison to taxonomies, ontologies are less numerous in biology, and they allow for structures based on more complex relationships (Stevens et al. 2000, 402).
General ontology is the study of “the most pervasive features of reality, such as real existence, change, time, causality, chance, life, mind and society” (Bunge, 201). Domain or special ontology “studies one genus or thing or process” (Bunge, 201) and results in “the instantiation of a concrete ontological model of that domain.” (Schulze-Kremer 2002, 2) Schulze-Kremer defines ontology as a “concise and unambiguous description of principle relevant entities with their potential valid relations to each other.” Ontologies are formalized systems of concepts (defined with propositions) and axioms. The concepts describe instances, or type-of relations defined as classes. The building blocks of ontologies also include predicates (relations and attributes defined as classes) and formal inference rules such as first-order predicate and propositional logic, used for valid reasoning about concepts and their predicates (Schulze-Kremer 2002, 7).
The concepts within ontology are synonymous with classes of entities. If the membership to the class is defined with necessary a condition the concept is primitive, whereas if the properties that define membership in a class are both necessary and sufficient the concept is defined (Stevens et al. 2000, 399). For example, the atomic number is sufficient to describe an atom (therefore atom is a defined concept), whereas having a hydrophobic core is only a necessary condition for membership in the class globular protein (non-globular proteins could also have a hydrophobic core), so globular protein is a primitive concept (Stevens et al 2002, 138; Stevens et al 2000, 400). There are many types of possible relations in an ontology that can also be classified into a hierarchy; the possible relations include: generic-specific (e.g., enzyme is a kind of protein), portative (e.g., modification site is a part of protein), associative-nominative (e.g., gene and gene name), associative-locative (e.g., chromosome has a nucleus location), associative-functional (e.g., protein has the function receptor) (Stevens et al 2000, 400).
The Open Biomedical Ontologies [5] contains a repository of more than 50 structured controlled vocabularies that are publicly available for the domain of biomedicine. Among these, Gene Ontology (GO) describes “attributes of gene products in three non-overlapping domains of molecular biology” formalized in three separate ontologies: molecular function (e.g., kinase activity), biological process (e.g., cell death, apoptosis), and cellular component (e.g., nuclear inner membrane) [6]. Recently a fourth ontology has been added, Sequence Ontology (SO) “permits the classification and standard representation of sequence features” such as ‘exon’. The list of annotation projects using GO is extensive [7]. The Gene Ontology (GO) is intended as a database annotation tool with the narrow scope of “gene products within an organism” (Stevens et al 2000, 405). However, Schulze-Kremer reminds us that GO is a controlled vocabulary not a gene ontology; among its shortcomings is the inconsistent use of the main relations ISA and PARTOF: “ISA can mean ‘subclass of’ or ‘instance of’ […] similarly PART OF is found in places with the following meanings: ‘made of’, ‘belongs to’, physical part of’, ‘conceptual part of’, ‘subprocess of’, ‘controls’, ‘causes’, ‘activates’, ‘inhibits’, ‘enclosed by’ and ‘binds to.’” (Schulze-Kremer 2002, 11)
Another well known ontology is the Macromolecular Crystallographic Information File (mmCIF) [8] closely integrated with the Protein Data Bank (PDB) [9], and widely used in protein databases and protein structure related sciences (Köhler & Schulze-Kremer 2002, 8). mmCIF originated with support from International Union of Crystallography with the initial goal of a consistent data representation for the exchange of data associated with the crystallographic experiment and the final molecular structure. As the data set grew, the need for more functionality such as validation and verification for improved consistency led to the development of a new Dictionary Definition Language (DDL) that is applicable to macromolecular structure in crystallography and other disciplines (Westbrook et al. 2000).
Ontologies can be classified with the following facets: domain oriented (e.g., Escherichia coli, gene function, chromosome), task-oriented (e.g., annotation analysis) and generic (Stevens et al 2000, 401). Schulze-Kremer’s Ontology for Molecular Biology (MBO) is a generic ontology intended as a common semantic for the biology database community. It contains a wide range (1200 nodes) of concepts “required to describe biological objects, experimental procedures and computational aspects of molecular biology” (Stevens et al 2000, 404). EcoCyc is an example of a domain oriented ontology that is used to specify a database schema for E. coli genes, metabolism, regulation and signal transduction (Stevens et al 2000, 404). Wroe et al describe their proposal for extending the myGRID [10] project with an ontology that is used to classify bioinformatics services such as SWISS-PROT, BLAST, etc. (Wroe et al 2003).
Interoperability of Bioinformatics Databases
Bioinformatics databases are not uniform in structure, organization, terminology and use different conventions for object identification. Schulze-Kremer lists three kinds of differences among the databases: terminological (synonyms, aliases, formulae), syntactic (file structure, separators, spelling) and semantic (intra and interdisciplinary homonyms) (Schulze-Kremer 2002). The same category label or captions can have a different meaning depending on the database. Similarly, a multitude of category labels can refer to the same class of entities (Schulze-Kremer 2002). Kremer’s MBO ontology is intended as a database annotation tool for establishing a common reference as to the semantic relationships between database entries (Stevens et al 2000, 404).
The integration of a multitude of knowledge bases for similarity and algorithmic analysis requires a formalized system that would allow for vocabulary and authority control for the various record labels and identifiers. Identifiers are particularly important from a programming perspective. According to Clark et al, Life Science Identifiers (LSID) and the LSID Resolution System (LSRS) meet the requirements for integrating multiple knowledge resources. Organizations assigning LSIDs are responsible for: (1) uniquely identifying themselves, typically with an Internet domain name, (2) ensuring uniqueness of LSID within their domain, (3) returning an error when an object is improperly referenced (Clark et al. 2004, 62). Well-formed LSIDs include a globally registered authority ID (e.g. ebi.ac.uk), an authority specific namespace (e.g.: SWISS-PROT.accession), an object ID (e.g.: P34355) and a version number. Clarke et al. give the following examples of well-formed LSIDs (Clark et al. 2004, 62):
· URN:LSID:ebi.ac.uk:SWISS-PROT.accession:P34355:3
· URN:LSID:rcsb.org:PDB:1D4X:22
· URN:LSID:ncbi.nlm.nih.gov:GenBank.accession:NT_001063:2
The necessity of these identifiers (e.g., accession number) brings to light the fact that ontologies “must necessarily reflect a specific view of the data.” (Baker et al. 1999: 510) Only the bioinformatics view of the concept of protein implies that an accession number is associated with a protein since this number is necessary for the retrieval of that particular protein data (or metadata), such as its sequence, from a database. (Baker et al. 1999: 511) An ontology for molecular biology or biochemistry that intends to describe real physical proteins has no need for accession number.
Whether communication consists of information storage or retrieval from a database, or the verbal exchange between two scientists, symbols are used to denote abstract concepts. In the case of the former, symbols are the field labels and metadata element and qualifier names, whereas in the latter, the symbols are verbal utterances. As Kazic points out, “symbol qua symbol is mute: it doesn’t say what abstraction it denotes” (2000, 1129). The meaning of each symbol is defined and understood by entities that “know the language accurately enough to define the symbol’s semantic and is vigilant enough to detect and correct any errors of usage” (Kazic, 1129).
Kazic argues that the challenge of different semantics of databases (i.e., metadata) for database interoperability cannot be solved by publicly declaring metadata information alone, “deciding if X in one database is X, Y, or some relative of Z in another” cannot be accomplished by ‘comparing the topologies of the ontologies’ nor by matching identical symbols. (1140) According to Kazic, ontologies are insufficient since they aim to classify, not define terms (Kazic 2000, 1130). The relationships among biological entities are complex and “incorporate many types of relationship, from geometric adjacency to systems of nonlinear differential equations [that] cannot be reduced to set membership and subsumption (e.g. part-whole, or isa)” (Kazic 2000, 1130). Kazic proposes the formal language Glossa that allows for computable definitions expressed in terms (semiotes) that satisfy requirements (unique, disjoint, elementary) which ensure an axiomatic foundation (1132). The crucial difference is that Glossa does not classify concepts into hierarchies, and instead focuses on computable definitions. (Kazic 2000, 1140) In order to use Glossa and semiotes “to share information and computations among disparate, independent systems” the semiotes need to be publicly defined and maintained with declared interfaces and parsers between the public semiotes and locally maintained systems (Kazic 2000, 1139).
The Transparent Access to Multiple Bioinformatics Resources (TAMBIS) project uses the TAMBIS ontology (TaO) “to facilitate the interoperation and fusion of bioinformatics resources” (Stevens et al 2002, 135). The ontology is used to enable scientists to query multiple databases using a single query language (Stevens et al 2000, 405). TaO is implemented using Descriptive Logics (DL) to capture and represent knowledge of biological concepts as well as make inferences and automatically classify concepts based on collections of declarative statements (Baker et al 1999, 511; Stevens et al. 2002, 135). As a result of this, TaO is dynamic: “it can grow without the need for either conceptualizing or encoding new knowledge” (Stevens et al 2000, 406).
Initially, the GRAIL concept modeling language was used to implement TaO, and has since been migrated to Ontology Inference Layer (OIL) language [11] . The basic building blocks of DLs are individuals, concepts (description of a class of individuals), and roles (relationships and attributes). GRAIL enables the expression of composite definitions of collections of concepts, and is able to infer a classification on them based on the relationships and attributes (Baker et al 1999, 512). For example, to express a possible relationship between motif and protein we create the following pair of composite concepts: 1. <Motif isComponentOf Protein> and 2. <Motif hasModification PostTranslationModification>, and the GRAIL classifier automatically places these below Motif in the hierarchy (Baker et al. 1999, 512). The smaller version of TaO, with 250 concepts describes proteins and enzymes (their motifs, structure, function, process), whereas the larger 1500-concept model includes nucleic acids and genes (Stevens et al 2000, 406).
TAMBIS aims to provide a single access point for biological information sources through the use of a mediating ontology: the query is phrased using TaO and TAMBIS converts the requests to the appropriate vocabularies for each source. (Baker et al. 1999, 513) Tao was designed as ontology for retrieval rather than hypothesis generation, resulting in a ‘broad and shallow’ structure that includes the domain of molecular biology (chemical structures, functions and processes) as well as bioinformatics (content, properties and methods available in resources.) (Stevens et al. 2002, 136) Among the reasons for switching from GRAIL to OIL is the use of supporting technologies such as the OILEd [12] which allows for “easy toggling of class descriptions from primitive to defined” thus allowing for specification of a necessary versus necessary and sufficient conditions for an instance to be a member of a class (Stevens et al 2002, 137). For example, the atomic number is sufficient to describe an atom, but the property of containing carbon is only a necessary condition for being an organic molecular compound since CO2 contains carbon but is not an organic molecular compound (Stevens et al. 2002, 138).
Crichlow et al. point out that although there are advantages to TAMBIS’ federated database approach such as the fact that users will need only to be familiar with a single ontology (TaO) while gaining automatic access to new resources as they are integrated, the approach will not scale to hundreds of sources. The user makes a query using the TaO and TAMBIS wrappers translate this query to local source ontologies and retrieve data into a unified user environment. “Since the wrappers are manually created, however, this approach will not scale to hundreds of sources.” (Crichlow et al. 2001, 8) For this reason, Crichlow et al. suggest the DataFoundry architecture which relies on mediator classes generated automatically given a complete set of meta-data (a domain ontology, local resource schemas, transformations, and mappings) (11). The argument for this approach is that in the long-term, due to the growing number and importance of small bioinformatics sources, what is need is a “method to automatically identify, categorize, and support interactions with genomics sources.” (Crichlow et al. 2001, 16)
Conclusions
A well-constructed thesaurus is similar to an ontology: terms represent concepts and the relations (predicates) are limited to whole-part, generic-specific, synonymy, instantiation, mental-association, antonymy, etc. ODLIS defines a thesaurus as a “[…] lexicon of terms comprising the specialized vocabulary of an academic discipline or field of study, showing the logical and semantic relations among terms[…]” where semantic relations consists of the following: “Active, Associative, Causal, Generic, Hierarchic, Locative, Partitive, Passive, Antonymous, Synonymous” (Reitz). Anyone who has constructed a thesaurus understands that the relationships expressed represent and imply a semantic that is not necessarily commonly understood or agreed upon by all the members of a domain. The thesaurus serves as a reference (and indexing tool) for communicating parties within a domain of interest.
The domain of bioinformatics requires its own ontology, and as Baker et al. point out, “this will be just one of many possible ontologies for biology” (1999, 510). Schulze-Kremer argues that due to the enormous complexity of the bioinformatics domain and the multitude of possible facets in the ontology, a ‘situated’ approach is necessary which emphasizes the intended use of the ontology (2000, 3). The idea that intended use of the ontology should be considered in its construction is not new:
“STEPS IN THESAURUS CONSTRUCTION […] 3. Identify the users. What are their information needs? Will they be doing their own searching or will someone do it for them? Will their questions be broad or specific?” (Cleveland & Cleveland, 42-43)
The TaO ontology was designed to solve the information needs of biologists describing data to be retrieved from bioinformatics resources, and the help of “a survey of questions actually asked by biologists” (Baker et al. 1999, 514) was used. Bartlett & Toms develops a user-centered approach for describing information retrieval tasks in bioinformatics leading to the functional analysis of a gene sequence. The resulting task analysis can serve as the source for requirement definitions of an information retrieval system that integrates the various database sources and tools used by scientists trying to solve specific problems. Bartlett & Toms’ task analysis confirms that bioinformatics experts use a multitude of data sources and tools (Bartlett & Toms 2005, 469), so the need to integrate these into seamless systems with a common semantic and ontology seems a justified challenge.
There is a fundamental ontological distinction between 1) biological entities such as molecules, nucleic acid, and proteins and 2) the data that describes them in databases. Trying to define an ontology that will include absolutely all concepts and relations in 1 and 2 is much too complex. Therefore, we can fall back on the well established principle of controlled vocabularies: the importance of their intended use. However, the challenge of integrating these sources is not simply a question of information access, as has been the case with controlled vocabularies in LIS. The information content and ontology structures are used in relationship mapping and automatic inference generation. This raises an important question about science: to what extent is it valid to test a hypothesis about physical entities by running computations on databases? The answer to this question depends to a significant extent on the validity of the underlying semantic models and corresponding ontologies and controlled vocabularies used in the process of the computation.
Even though bioinformatics resources will continue to be used only as a supplement to experimentation with actual physical entities, there is nevertheless a growing need to hypothesize and generate conclusions based on information sources through computation. This means that information itself has definitely become the object of scientific study. Research into ontology applications, semantic web, and ontology definition languages continue to be of interest within computational biology, as evidenced by organizations such as the Bio-Ontologies workshop [13] and special interest group to the International Society for Computational Biology (ISMB).
Integrating multiple information sources raises difficult questions for HCI: how to integrate the human expertise of biologists in a way that will be efficient and take advantage of what the human mind can do better than a computer? The answer to this question lies in a dialog between biologists and computer scientists. Biologists need to learn about the limitations and strengths of computers, whereas computer scientists need to understand the physical entities that the various data sources describe. It will be interesting to observe attempts to integrate the expertise of librarians and information professionals in the quest for federated meta-search bioinformatics tools. Bioinformatics research into the use of ontologies could benefit from the knowledge organization, classification and cataloguing theory that have traditionally been the domain of library and information science.
Table 1. Examples of Ontologies and Controlled
Vocabularies in Bioinformatics.
(for a longer list of evaluated ontologies see Köhler and Schulze-Kremer 2002)
|
Name |
Description/Purpose |
Reference URL |
|
macromolecular Crystallographic Information File (mmCIF) |
controlled vocabulary for macromelecular structure, used by Protein Data Bank (PDB) |
http://mmcif.pdb.org/ |
|
Unified Medical Language System (UMLS) |
medical information, ‘metathesaurus’ combines over 100 source vocabularies (e.g. MeSH, RxNorm [clinical drugs], etc.) |
http://www.nlm.nih.gov/research/umls/ |
|
Human Genome Organization (HUGO) Nomenclature |
genomics – human gene symbols and names, used by GDB Human Genome Database |
http://www.gene.ucl.ac.uk/nomenclature/ |
|
Gene Ontology (GO) |
controlled vocabulary (IsA hierarchy is not formal) to describe gene product attributes, actually uses three ontologies: molecular function, cellular component and biological process. |
http://www.geneontology.org/ |
|
TAMBIS Ontology (TaO) |
describe the metadata of many underlying data sources, representing an over-arching universal schema. |
http://imgproj.cs.man.ac.uk/tambis/details.html http://www.cs.man.ac.uk/~stevensr/tambis-oil.html |
|
EcoCyc |
domain specific to Escherichia coli, includes detailed expression of metabolic and signaling pathways, reactions, enzymes, genes, tRNA, etc. |
http://ecocyc.pangeasystems.com |
|
NCBI Entrez Taxonomy |
the names of all organisms that are represented in the genetic databases with at least one nucleotide or protein sequence |
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Taxonomy |
Table 2. Examples of concepts
expressed with TAMBIS Ontology using GRAIL and OIL
(Baker et al 1999, 517; Stevens et
al. 2002, 137)
|
concept |
GRAIL |
relevant sources |
|
tertiary structures of proteins with contain motifs that are involved in hydrolase activity |
TertiaryStructure
which isStructureOf (Protein which hasComponent (Motif which
indicatesFunction Hydrolase)) |
SWISS-PROT, PROSITE, CATH, EMP |
|
ESTs that code for proteins that contain glycosylation sites. |
EST which codesFor (Protein which hasComponent GlycosylationSite) |
dbEST |
|
concept |
OIL |
|
definition of protein class and its subclasses: enzyme (and its subclass holoenzyme) and holoprotein. |
class-def protein class-def defined holoprotein subclass-of protein slot-constraint binds has-value prosthetic-group class def defined enzyme subclass-of protein slot-constraint catalyses has-value reaction class def defined holoenzyme subclass-of enzyme slot-constraint binds has value prosthetic-group |
Glossary
controlled vocabulary – “An established list of preferred terms from which a cataloger or indexer must select when assigning subject headings or descriptors…” (ODLIS)
database schema – describes categories and their data types in a database.
knowledge base - “the combination of an ontology with associated instances” (Stevens et al 2000, 401)
metadata – “Structured information used to describe information resources/objects for a variety of purposes[…]Metadata can be categorized as descriptive, structural, and administrative. Descriptive metadata facilitates discovery, identification, and selection. Structural metadata describes the internal structure of complex objects. Administrative metadata aids in the management of resources and may include rights management metadata, preservation metadata, and technical metadata describing the physical characteristics of a resource. “(ODLIS)
metathesauri – combines many thesauri into one.
ontology – “concise and unambiguous description of principle relevant entities with their potential valid relations to each other.” (Schulze-Kremer)
semiote “a semiote is a symbol denoting the semantics of a useful elementary part of an idea, datum or computation” (Kazic, 2000, 1135)
taxonomy – a system of superclass and subclass relations.
thesaurus – “refers to an alphabetically arranged lexicon of terms comprising the specialized vocabulary of an academic discipline or field of study, showing the logical and semantic relations among terms, particularly a list of subject headings or descriptors used as preferred terms in indexing the literature of the field.” (ODLIS)
topology “The way in which constituent parts are interrelated or arranged.”([14])
Works Cited
Baker Patricia G., Goble Carole A., Bechhofer Sean, Paton Norman W., Stevens, R., Brass, A. “An ontology for bioinformatics applications.” Bioinformatics. 15.6 (1999): 510-520. 29 Mar. 2005. <http://bioinformatics.oupjournals.org/ >
Bunge, Mario. Philosophical
Dictionnary. Enl. Ed.
Clark Tim, Martin Sean, Liefeld Ted. “Globally distributed object identification for biological knowledgebases.” Briefings in Bioinformatics 5.1 March (2004) : 59-70.
Critchlow Terence, Musick Ron, Slezak Tom. “Experiences applying meta-data to bioinformatics.” Information Sciences 139 (2001): 3-17.
Fenstermacher,
David. “Introduction to
Bioinformatics.” Journal of the
American Society for Information Science and Technology. 56.5 (2005):
440-446.
Hester M. Wain, Elspeth
A. Bruford, Ruth C. Lovering, Michael J. Lush, Mathew W. Wright and Sue Povey. “Guidelines for Human Gene Nomenclature.” Genomics 79(4):464-470 (2002).
2 Apr. 2005.
<http://www.gene.ucl.ac.uk/nomenclature/guidelines.html>
Kazic, Toni. “Semiotes: a semantic for sharing.” Bioinformatics . 16.12 (2000):
1129-1144. 1 Apr. 2005. <http://bioinformatics.oupjournals.org/cgi/reprint/16/12/1129
>
Köhler, Jacob &
Schulze-Kremer, Steffen. “The Semantic
Metadatabase (SEMEDA): Ontology based integration of federated molecular
biological data sources.” InSilico
Biology 2.21 (2002). 7 Apr. 2005.
<http://www.bioinfo.de/isb/2002020021/ >
MacMullen, W. John,
Denn, Sheila O. “Information problems in
molecular biology and bioinformatics.” Journal of the American
Society for Information Science and Technology. 56.5 (2005): 447-456
Reitz, Joan M. ODLIS: Online Dictionary for Library and Information Science.
Libraries Unlimited, 2004.
Schulze-Kremer,
Steffen. “Ontologies for molecular
biology and bioinformatics.” In
Silico Biology. 2.17 (2002).
5 Apr. 2005. <http://www.bioinfo.de/isb/2002/02/0017/main.html>
Stevens Robert, Goble Carole A., Bechhofer Sean. “Ontology based knowledge representation for bioinformatics.” Briefings in Bioinformatics. 1.4 (2000): 398-414.
Stevens Robert, Goble Carole, A, Horrocks Ian, Bechhofer Sean. “Building a Bioinformatics Ontology Using OIL.” IEEE Transactions on Information Technology in Biomedicine. 6.2 (2002): 135-141. 6 Apr. 2005. <http://ieeexplore.ieee.org/>
Wain, H.M, Lush M, Ducluzeau, F, Povey, S. Genew. “The Human Gene Nomenclature Database.” Nucleic Acids Research 30.1 (2002): 169-171. 2 Apr. 2005. <http://nar.oupjournals.org/cgi/content/full/30/1/169>
Westbrook, John D. & Bourne Philip E. “STAR/mmCIF: An Extensive Ontology for Macromolecular Structure and Beyond”. Bioinformatics (2000) 16(2), 159-168. 6 Apr. 2005. <http://bioinformatics.oupjournals.org/cgi/reprint/16/2/159>.
Wroe Chris, Stevens Robert, Goble Carole, Roberts Angus,
[1] About HGNC –
History. 2004. Human Genome Organisation. 1 Apr. 2005.
<http://www.gene.ucl.ac.uk/nomenclature/aboutHGNC.htm>
[2] HUGO. 2004.
Human Genome Organisation. 2 Apr.
2005.
<http://www.gene.ucl.ac.uk/hugo/>
[3] The Internet Classics Archive – Aristotle’s
Categories, part 4. 2000. Daniel C. Stevenson. 2 Apr. 2005.
<http://classics.mit.edu/Aristotle/categories.1.1.html >
[4] NCBI
Taxonomy Browser.
[5] Open Biomedical Ontologies. SourceForge.
5 Apr. 2005.
<http://obo.sourceforge.net/>
[6] Gene Ontology Consortium. “The Gene Ontology (GO) database and
informatics resource.“ Nucleic Acids Research
32,
Database issue D258-D261 (2004).
3 Apr. 2005. <http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D258
>
[7] Gene Ontology – A Bibliography - Annotations using
GO. 4 Apr. 2005. Gene Ontology Consortium. 7 Apr. 2005.
<http://www.geneontology.org/GO.biblio.shtml#annots>
[9] RCSB Protein Data Bank. Research Collaboratory for Structural
Bioinformatics (RCSB). 5 Apr. 2005. <http://www.rcsb.org/pdb/>
[10] myGRID. 2004. Engineering and Physical Sciences Research Council (EPSRC). 13 Apr. 2005. <http://www.mygrid.org.uk/>
[11] see Table 2 of this paper for TaO examples in GRAIL and OIL
[12] OilEd.
17 Sep. 2002. Sean Bechhofer
& Gary Ng. 6 Apr. 2005. <http://oiled.man.ac.uk>
[13] Eighth Annual Bio-Ontologies Meeting. 24 Jun. 2005. Robert Stevens,Robin McEntire, Phillip Lord, and James.A.Butler. 11 Apr. 2005. <http://bio-ontologies.man.ac.uk/>
[14]