Recent Articles

What's Good About Clojure?

Clojure is a relatively new language to appear on the Java Virtual Machine (JVM), although it draws on very mature roots in the form of the LISP langu ...

Should You Care About Requirements Engineering?

Recently, I (Adil) was invited to participate in a one day seminar on the subject of Requirements Engineering. Whilst I have no direct experience of t ...

Tips for Setting Up Your First Business Website

To attract all potential customers to your business you need a presence on the web. The problem is that if you haven't set up a website before, you p ...

What's Good about LISP?

LISP is a general-purpose programming language and is the second-oldest programming language still in use, but how much do you know about it? Did you ...

Open Source Tools for Developers: Why They Matter

From a developer's point of view use of open-source tools has advantages beyond the obvious economic ones. With the open-source database MySQL in mind ...

What the Heck is an Ontology?

Discuss this article >>>

Introduction

The word 'ontology' is now a fashionable term among software developers, but the first time that you hear it you are likely to wonder what it means. It is a word whose origin lies in ancient Greek philosophy ('the most general branch of metaphysics, concerned with the nature of being'), but in recent years has been borrowed and re-applied to computer science and software development. Over the last 10 years or so the level of interest has grown from being an emerging academic research topic to a mainstream concern. This rise is largely due to the explosion of the Internet and the need for greater interoperability among computers and programs. For example, Tim Berners-Lee, widely hailed as the 'inventor' of the World-Wide Web, described an ontology layer in his vision of the Semantic Web.

A Frequently-Asked Question

But what the heck is an ontology, and what can it do for you?

This is not an easy question to answer directly, so first I would like to answer it in a very indirect way. Imagine that you have been held in cryonic suspension until the year 2504. You wake to find that humans are no longer the only intelligent life form on the earth; there are all kinds of aliens living here too - some furry, some scaly, some with three-eyes, and some that hop around on one leg. After five centuries of sleep you wake up feeling ravenous and would like to ask for food, but you are surrounded only by aliens that you hear talking a very strange language. So, where do you start? How do you convey your feeling of hunger and the need for food to another being whose perception of the world is very different?

Now imagine a different scenario. You are an advanced, intelligent software robot acting on behalf of your programmer-creator, trying to find the best deal for a holiday in Fiji. You know about an Internet web service that offers good prices, and decide to contact it to see what it can offer. The problem is, the web service is different to anything you've seen before and you're not sure if it will understand your request.

These two scenarios share some common features. In both cases, an agent (either a person or a program) is trying to communicate with another agent to service a simple request. However, in both cases the problem is the same: a lack of communication and mutual understanding. The problem lies not at the level at which messages are sent and received, but at the level at which they are processed and understood. In the first scenario, it would be easy enough to ask for food, but would the word 'food' be understood? And perhaps more importantly, would the concept of food be understood? (It is not obvious that an alien would ingest food in the same way that we do.) In other words, effective communication between agents demands a common terminology (or language), and a common semantics (or meaning). That's where an ontology comes in. An ontology is a machine-readable representation of a domain's terminology and the relationships among the terms in the domain.

Other Definitions of 'Ontology'

There are plenty of other definitions of ontology out there, and different people have their own opinions on what constitutes an ontology.

Here are some examples:

"an ontology is a set of concepts - such as things, events, and relations - that are specified in some way - in order to create an agreed-upon vocabulary for exchanging information", whatis.com
"An ontology is a shared understanding of some domain of interest", Mike Uschold and Michael Gruninger, 1996.
"An ontology is a formal, explicit specification of a conceptualization", Thomas Gruber, 1993.

The last of these definitions was written over ten years ago and is probably the most commonly cited. I believe it is a good definition in the sense that it accurately and concisely defines an ontology, but also a bad definition because it is so 'user-unfriendly'. In the definition, a conceptualization is an abstract model of some phenomenon in the world. An ontology is a specification of a conceptualization because it is not (necessarily) the final representation that will be incorporated into a software system, but a more abstract model that is defined as an early part of a software development process, and maintained independently of the software. An ontology is explicit because the concepts used and the constraints on their use are explicitly defined, rather than being implicitly defined in software. Finally, an ontology is formal because it should be machine readable.

When you commit to an ontology, you accept the contents of the ontology, and promise to use the terminology that it prescribes and uphold the relationships that it contains. For effective and reliable between two agents, they should both commit to the same ontology over the domain of discourse. In other words, if the agents don't share a mutual understanding of the domain, then misunderstandings are likely to occur. Such misunderstandings are due to so-called ontological mismatches. Ontological mismatches are most difficult to detect when the agents' ontologies are similar, but also subtly different. Detecting and repairing such mismatches is a difficult problem, and still a current research topic.

How to Recognise an Ontology

It would be easy to spend too long on formal or wordy definitions of ontology, but an alternative, and perhaps more productive, definition of an ontology emerges by considering how to recognise one. I believe a knowledge model is an ontology if, when considering it, you can answer 'yes' to all of the following questions:

Is it a declarative, explicit representation of a domain?
Is it machine-readable?
Is it consensual? In other words, have many people critiqued, revised and agreed upon the contained terms and relationships?
Can it be used to solve more than one problem in the domain?
Will it be used in multiple applications?
Is it stable (i.e. changes little over time) and long-lived?

It may be that you already have something that passes all of these criteria, but currently goes under another name, such as a conceptual domain model, or schema specification. If this is the case, you have already been using sound software development practises to reap the benefits of ontologies under the guise of a different name. So, what are those benefits?

What can an Ontology do for you?

The second half of the frequently-asked question cited earlier is what ontologies can do for you. Why are they interesting, and why should you bother?

I think there are two main motivations for developing, applying and sharing ontologies. The first is the joy of discovery, and the aim to push the boundaries of computing by recording in an explicit and declarative way knowledge that might otherwise have been bound up in the details of system code. In so doing, we furnish computer programs with bodies of knowledge that can be inspected by humans, interrogated by other programs, and introspected by the computer agents themselves. The second motivation is much more pragmatic - the longer term prospect of a financial return on investment. The technologies used for building computer systems frequently change. In contrast, a domain representation in the form of an ontology is much more "technology neutral" and more free of application concerns. It provides a common terminology for system developers and therefore helps to avoid misunderstandings among members of the team. An ontology is a long-lived conceptual model that can be used in multiple applications, providing good opportunities for reuse and interoperability. Lastly, but no less importantly, you can share your ontology with potential customers. Customers will soon recognise that your ontology contains a representation of 'their' domain, so it will increase the appeal of your products or services. It will also help them to identify and understand the differences between your view of the domain and theirs.

If you are planning to use an ontology in your project, one of the first things to consider is whether there is already an existing ontology that matches your requirements. If one does exist, then you can save the effort of developing one from scratch and may be able to benefit from an exchange of knowledge between your application and other applications that commit to the same ontology.

You might find that you can make more effective reuse of concepts across diverse applications if they share the same terminology and semantics for the most general concepts. In other words you may find it useful to generalize from specific to more abstract concepts to aid reuse: the more general a concept is, the more reusable it is likely to be. As general concepts tend to sit towards the top of inheritance hierarchies, collections of these general concepts are referred to as upper ontologies. Some well-known upper ontologies are CYC (and its open source version, OpenCYC); and the Suggested Upper Merged Ontology, SUMO. (You can find out more about upper ontologies from the IEEE Standard Upper Ontology Working Group). Wordnet is also a useful resource, but is probably more useful as an aid to the development of ontologies than an ontology to be used directly.

Related Products and Technologies

When I said earlier that ontologies are technology neutral, I was not wrong, but I wasn't telling the whole story either. The problem is that if you say that an ontology must be machine-readable, then you must also decide how to represent it in a machine, and choose a technology to support that representation. In other words, you can never completely escape the technology issue. However, this doesn't destroy the argument of technology neutrality, as the ontology representation and associated technologies are completely divorced from the technologies used for the final software application.

Protege Example Screenshot

Figure 1. The Protégé Ontology Editor

Ontologies are developed and maintained using a specialist ontology editor, such as OilEd, OntoEdit, or Protégé (see figure). These tools (which can be downloaded for free) provide support for the definition and validation of ontologies, and also allow them to be shared. A common machine-readable language is needed to share ontologies themselves, and there are several available, including RDF, DAML+OIL and OWL. The Resource Description Framework, RDF, is a W3C technology that was devised to describe resources on the Internet, but can also be used to exchange ontological knowledge. This was an early technology and its primitives are not as expressive as with more recent initiatives. DAML+OIL is the Darpa Agent Markup Language combined with the Ontology Inference Layer. DAML+OIL is an ontology language specifically designed for use on the Web, and builds on RDF by supporting richer, but still decidable, modelling primitives. OWL is a more recent Web Ontology Language, and comes in three different versions with different levels of expressiveness. In increasing expressiveness, these are OWL Lite, OWL DL and OWL Full.

Summary

With the explosion of the Internet grew the need to share and reuse networked knowledge. Ontologies aim to fulfil that need by facilitating the noble goals of sharing, reuse and interoperability. They achieve this by moving the focus of systems design away from technology, and towards a solution that works at the levels of both technology and knowledge. This is an important step, and is necessary if we wish to make the best use of the electronic knowledge available to us.

Discuss this article >>>

Simon White