Dr. Işık Barış Fidaner

One should clearly distinguish “GO-terms as categories” from “GO-terms as elements” being categorized. The first is given by the GO ontology whereas the second is given by GO annotations.

In ontological entropy these two roles that GO-terms play are (1) conceptually kept entirely separate, and (2) combined in a common context of computation.

Ontological entropy measures how much (a larger set of) GO-terms categorize (another smaller set of) GO-terms, how much they divide and separate them, how much they diversify them.

*Ontological entropy* is a measure of *diversity* for a given set of *elements *over a given set of *categories*. When a set of elements is categorized (in various respects), one can ask questions about the elements’ diversity (the amount of variety they represent) over these categorizations, and such questions can be answered by computing ontological entropies.

An *element *simply refers to an object, to a single object or to one object among many; it refers to what is found there. GO-terms have a status of being-found-there, a status of being an element, when they are used as gene annotations.

A *category *is what differentiates between the elements, what divides and separates the elements. In GO relations (“is_a” etc.) GO-terms have this second status of being a category that differentiates between the elements.

Note that since GO-terms may take on either role (element or category) depending on the context, it’s a conceptual necessity to differentiate being-a-category from being-an-element: *GO-terms as categories differentiate between GO-terms as elements.*

*Diversity* refers to the amount of variety. It’s very simple: When the categories don’t divide and separate the elements, there is no variety, no diversity. The more the categories divide and separate the elements, the more there is variety, the more diverse the elements are over those categories. Ontological entropy is the numerical measure of this diversity.

Intuitively, ontological entropy is a positive scalar numerical answer to the abstract question “How diverse is this set of elements over these categories?” Let’s explain this abstract question by a concrete example.

Consider these leaves as a set of elements. One categorization that divides and separates these elements might be the shapes of the leaves. The question regarding that categorization is “How diverse are these leaves over their shapes?” After actually categorizing the leaves over their shapes, one can answer this by computing an ontological entropy over the shapes. Another categorization might be their colours: “How diverse are these leaves over their colours?” After actually categorizing the leaves over their colours, one can answer this by computing another ontological entropy over the colours. A third categorization might be the combination of the first two categorizations: “How diverse are these leaves over their shapes and colours?” This third question can be answered by computing another ontological entropy that’s actually the average of the first two entropies.

Ontological entropies constitute a very important information metric for scientific work because sets of elements with more diversity (with respect to certain established categories) always *need more attention* by their nature (see the example application below). Whenever a set of elements is categorized in one way or another, a variable amount of diversity inherently emerges from the natural combinations of the elements as an immediate side-effect of the categorization. Making the actual categorizations over observed sets of elements is only one part of the scientific work. The real difficulty of science concerns the **appropriateness** of the categorizations, and the appropriateness of categories is naturally determined by questions of diversity.

Example Application of Ontological EntropyWhich genes have the most diverse sets of annotations over biological processes in

Schizosaccharomyces pombe?Which genes have the most diverse sets of annotations over biological processes in

Saccharomyces cerevisiae?

**The computation of ontological entropy**

Entropy for a single element is always zero. It’s simply because a single element cannot be divided by any category (and log(1) = 0). Entropy for two elements takes its non-zero value (log(2)/2) only over the categories that divide and separate the two elements. Entropy for three elements takes its non-zero values (log(3)/3 and (2/3)log(3/2)) only over the categories that divide the set and separate either 1 or 2 of the elements. Similarly, entropy for a set of n elements takes its non-zero values only over the categories that divide the set of n elements and separate subsets of 1, 2, …, n-2 or n-1 elements. Entropy is zero for categories that include 0 or n of the elements.

Most basic ontological entropy is the one computed over a single category that divides a set of n elements. This entropy takes its maximum value close to the ratio 37%-63% (the ratio is 1/e for theoretical reasons) for the division of elements into categorized-uncategorized. Entropy is zero when all or none of the elements belong to the category. In that case the category is too broad to bring any distinction to the given set of elements.

Ontological entropy over a given set of categories is the average of the entropies over those single categories. Standard ontological entropy is computed over the set of all subcategories of a root category (such as “biological process” in GO, to answer “How diverse is this set of annotations over biological processes?”) including itself. To compute a more specific ontological entropy over a smaller set of categories, one can choose a lower root category (e.g. subcategories of “negative regulation of biological process” including itself, to answer “How diverse is this set of annotations over negative regulations of biological processes?”).

Ontological entropy is computed over a given set of categories that divide and separate a given set of elements, and it’s a measure of the diversity caused by this division and this separation. When there is a more diverse set of elements, the categories divide/separate them by a greater amount, and the ontological entropy becomes greater.

**The use of ontological entropy**

Since (1) ontological entropy is a numerical measure of diversity, and (2) diversity is a natural measure for the amount of information found in a set of observations, and (3) information is always processed by paying attention, (4) *greater ontological entropy always means greater need of attention* (according to the established categorizations).

But since ontological entropy (and diversity) obtains its meaning only with respect to the established categorizations, it’s also possible to decide that those categorizations are inappropriate and modify them. In that case, the entropies and diversities will also change and the *need for attention* will be reoriented according to the new categorizations.

~~~

Ontological entropy is a special kind of *projection entropy*.

For more theoretical context, see REBUS 2.0