In Fingerprint Similarity and Entropy, we performed two tests. We first added an alcohol to a set of sugars, and then added a sugar to a set of alcohols. In both cases, entropy increased due to the outlier compound.
The change in entropy looks similar, but (as we said) the two tests are asymmetrical, because the sugar fingerprints have much more features than the alcohol fingerprints.
In both cases, an outlier causes an increase in entropy, but for different reasons: An alcohol is an outlier among the sugars due to a lack of features, whereas a sugar is an outlier among the alcohols due to an excess of features. This difference can be seen in their cumulative occurrence distributions (COD) .
When an alcohol is added to the 3 sugars, the resulting COD shows a fall at k=4, which means that there are several 3-element features:
There is a small number of 4-element features and a larger number of 3-element features because the new alcohol has a lack of features. 3-element features have positive entropy and the projection entropy increases.
On the other hand, when a sugar is added to the 3 alcohols, the resulting COD shows a peak at k=1, which means that there are several 1-element features:
This is because the new sugar has an excess of features over the set of features for the alcohols. 1-element features have positive entropy and the projection entropy increases.
In the general case, an outlier will have both a lack and an excess of features in comparison to the features of the existing set of elements. Thus one can observe both phenomena increasing the entropy together.
 Fidaner, I. B. & Cemgil, A. T. (2013) Summary Statistics for Partitionings and Feature Allocations. In Advances in Neural Information Processing Systems, 26.