KeyComparer: Bringing the expressive power of Select to IComparer<T>

A frequent requirement in object-oriented programming is to sort a sequence of elements according to a projected key. This key typically corresponds to a property of the element type, although it could also be a projection of any aspect of its state, including functions over fields, properties, and methods. For example, suppose that we have the following class:

public partial class Person
{
    public string Ssn { get; set; }
    public DateTime DateOfBirth { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
}

…and an array of its instances:

var persons = new []
{
    new Person { Ssn = "324-00-3015", DateOfBirth = new DateTime(1970, 12, 17), FirstName = "Tod",     LastName = "Temme" },
    new Person { Ssn = "548-00-1592", DateOfBirth = new DateTime(1968, 03, 13), FirstName = "Lucia",   LastName = "Armstrong" },
    new Person { Ssn = "129-00-7416", DateOfBirth = new DateTime(1982, 09, 02), FirstName = "Spencer", LastName = "Weaver" },
    new Person { Ssn = "831-00-6391", DateOfBirth = new DateTime(1974, 04, 30), FirstName = "Celia",   LastName = "Potter" },
    new Person { Ssn = "714-00-6502", DateOfBirth = new DateTime(1966, 11, 19), FirstName = "Powell",  LastName = "Beck" },
};

We might have scenarios where we want such a collection to be sorted by SSN, by date of birth, or by a concatenation of the first and last name.

Sorting in the .NET Framework

The .NET Framework Class Library exposes various sorting facilities: Array.Sort, List<T>.Sort, Enumerable.OrderBy, ParallelEnumerable.OrderBy, …. It also defines a number of sorted collection types that maintain their elements in sort order, such as SortedSet<T>, SortedList<TKey, TValue>, and SortedDictionary<TKey, TValue>. These collections allow elements to be inserted and removed more efficiently than if one were to sort the entire list each time. For example, SortSet<T> operations generally take O (log n) time, whilst a full sort would take O (n log n). The aforementioned three sorted collection types do not allow duplicate elements, although it is not inconceivable to implement a collection type that does.

IComparable<T>

Whilst the above types provide the implementation for the sorting algorithm or data structure, they by default rely on the element type to provide a means of determining the sort order for any given pair of elements. This is achieved through the IComparable<T> interface, which is to be implemented by types that have an inherent sort order. Various types have such an inherent order:

  • integers sort by the numerical order (-43, 2, 81, 122)
  • strings sort by the alphabetical order ("cat", "caterpillar", "mouse")
  • versions sort by the components (2.9.2, 2.10.0, 3.0.0)

The interface has a non-generic counterpart, IComparable, which should also be implemented by types alongside IComparable<T>, but only intended for use as a fallback by consumers who cannot avail of generics.

The IComparable<T> interface defines just one method to be implemented: CompareTo(T). When called on an instance of type T, with an argument specifying another instance of the same type, the CompareTo method should return an integer value as follows:

Value Meaning
Less than zero This instance precedes the specified other in the sort order.
Zero This instance occurs in the same position in the sort order as the specified other.
Greater than zero This instance follows the specified other in the sort order.

Whilst the overly permissive return values seem to be a minor infringement of Postel’s law, they provide for efficient implementations by numeric types. Consider the internal implementation for the Byte value type, which is a simple subtraction:

public partial struct Byte : IComparable<Byte>
{
    public int CompareTo(Byte value)
    {
        return this - value;
    }
}

This insight serves as a convenient way of remembering the method’s semantics: x.CompareTo(y) returns a value of the same sign as if one computed x - y for a numeric type.

In custom types, the IComparable<T> interface is commonly implemented by forwarding the CompareTo calls to a key property of the type, or some other projection of its state. For example, if we decide that the SSN is the natural key of our Person class, we could write:

public partial class Person : IComparable<Person>
{
    public int CompareTo(Person other)
    {
        // handle nulls

        return Comparer<string>.Default.Compare(Ssn, other.Ssn);
    }
}

We could consequently sort our Person[] array by SSN by calling Array.Sort<T>(T[]):

Array.Sort(persons);

IComparer<T>

A type may have more than one candidate sort order. The most popular case is the String class, which has six variants of the alphabetical order for performing sort-order comparisons: ordinal comparisons, linguistic comparisons under the invariant culture, linguistic comparisons under the current culture, and the case-insensitive counterparts of the aforementioned three. Only one sort order can be used as the default when implementing IComparable<T> – for strings, it’s case-sensitive linguistic comparisons under the current culture.

The other sort orders can be implemented through another interface that .NET provides: IComparer<T>. Whilst IComparable<T> is intrinsic to (and implemented by) the element type, IComparer<T> is extrinsic, intended to be implemented by a separate type. Its Compare(T, T) method takes two instances of the element type, x and y, and returns an integer value with semantics similar to those discussed earlier:

Value Meaning
Less than zero x is less than y.
Zero x equals y.
Greater than zero x is greater than y.

Rather than implementing the IComparer<T> interface directly, it is recommended to derive from the Comparer<T> abstract class. Comparer<T> provides an explicit interface implementation of the IComparer non-generic interface that forwards Compare(object, object) calls to the Compare(T, T) overridden method. Comparer<T> also provides a Default static property, which compares objects of type T using their implementation of the IComparable<T> generic interface (or, as fallback, the IComparable non-generic interface).

Most of the sorting facilities in the .NET Framework Class Library have a method or constructor overload that accepts such an IComparer<T> instance. For example, if we occasionally wanted to sort our Person[] array by date of birth, we could implement a custom comparer that provides this sort order:

public class PersonDateOfBirthComparer : Comparer<Person>
{
    public new static PersonDateOfBirthComparer Default { get; } = new PersonDateOfBirthComparer();

    public override int Compare(Person x, Person y)
    {
        // handle nulls

        return Comparer<DateTime>.Default.Compare(x.DateOfBirth, y.DateOfBirth);
    }
}

We could then sort our array using the Array.Sort<T>(T[], IComparer<T>) method overload:

Array.Sort(persons, PersonDateOfBirthComparer.Default);

Similarly, if we wanted to sort by full name, using a case-insensitive linguistic comparison under the current culture, we could define another comparer:

public class PersonFullNameComparer : Comparer<Person>
{
    public new static PersonFullNameComparer Default { get; } = new PersonFullNameComparer();

    public override int Compare(Person x, Person y)
    {
        // handle nulls

        return StringComparer.CurrentCultureIgnoreCase.Compare(
            $"{x.FirstName} {x.LastName}",
            $"{y.FirstName} {y.LastName}");
    }
}

However, this quickly gets unwieldy. What if we wanted to sort by full name using an ordinal or case-sensitive comparison? What if we introduced a new property to sort by? Ten new properties? Keep in mind that the // handle nulls comments above expand to several lines of code in each implementation:

if (object.ReferenceEquals(x, y))
    return 0;
if (x == null)
    return -1;
if (y == null)
    return 1;

LINQ

With LINQ, one-off key-based sorts can be easily achieved through the OrderBy<TSource, TKey> extension method, which accepts a function delegate that projects the key to sort by. It even offers an overload that accepts an IComparer<TKey> instance to use on the projected keys.

var bySsn         = persons.OrderBy(p => p.Ssn);
var byDateOfBirth = persons.OrderBy(p => p.DateOfBirth);
var byFullName    = persons.OrderBy(p => $"{p.FirstName} {p.LastName}", StringComparer.CurrentCultureIgnoreCase);

Sorted Collections

However, most of the rest of the .NET Framework Class Library still relies on the notion of comparers, especially in the case of sorted collection types, such as SortedSet<T>, which typically expose constructor overloads that accept an IComparer<T> argument. If we wanted to maintain a sorted collection of our Person instances under non-default sort orders, we would have to implement an IComparer<T> for each such sort order.

A possible workaround would be to use the SortedList<TKey, TValue> or SortedDictionary<TKey, TValue> structures instead, which introduce the notion of keys. However, these structures treat keys as extrinsic to the values. They have no functionality for projecting keys from values, requiring consumers to do so manually for each call that modifies their contents (for example, Add(TKey, TValue), Remove(TKey), …). This is not desirable for scenarios where keys are always projected consistently – the key projection function should be encapsulated within the data structure, much like SortedSet<T> does when combined with our custom comparers.

Proposed solution: KeyComparer<TSource, TKey>

Our proposed solution is to provide a KeyComparer<TSource, TKey> class that facilitates the creation of IComparer<TSource> instances, which could then be used with sorted collection types such as SortedSet<TSource>. Rather than having to define a new class implementing IComparer<TSource> for each key projection, consumers would only need to supply their key projection function, typically as a lambda expression, to our factory method.

Consider the example below, where a SortedSet<Person> collection is initialized with a comparer that sorts elements by their SSN:

var bySsn = new SortedSet<Person>(persons, KeyComparer.Create((Person p) => p.Ssn));

SortedSet<T> does not allow duplicate elements, so it should not be used for keys that are not guaranteed to be unique. For non-unique keys, we can use a third-party sorted collection type that permits such duplicates, such as the OrderedBag<T> collection from Wintellect’s Power Collections for .NET. Pairs of elements considered to be equal (namely, for which Compare(x, y) returns 0) would either preserve their original order or be sorted arbitrarily, depending on whether the underlying sorting algorithm is stable.

var byDateOfBirth = new OrderedBag<Person>(persons, KeyComparer.Create((Person p) => p.DateOfBirth));
var byFullName    = new OrderedBag<Person>(persons, KeyComparer.Create((Person p) => $"{p.FirstName} {p.LastName}", 
                                                                       StringComparer.Ordinal));

Just like OrderBy<TSource, TKey>, the KeyComparer.Create<TSource, TKey> factory method can accept an IComparer<TKey> inner comparer to be applied to the projected keys. In the last example above, an ordinal comparison is used for the full names. If such an inner comparer is not specified, Comparer<TKey>.Default is used.

Subsequent element operations on the sorted collections are straightforward. Each collection would internally take care of extracting the elements’ keys through the specified KeyComparer<TSource, TKey>, and use it to find the correct location for new or existing elements:

var person = new Person { Ssn = "301-00-1582", DateOfBirth = new DateTime(1984, 11, 01), FirstName = "Teddy", LastName = "Wake" };
bySsn.Add(person);
bySsn.Remove(person);

Contrast this with the explicit keys needed to be specified by consumers of SortedDictionary<TKey, TValue> (or SortedList<TKey, TValue>) for each operation on the collection. The lack of encapsulation of the key projection logic makes such operations more error-prone, since the consumer needs to ensure that the correct key is manually supplied each time:

var bySsnDict = new SortedDictionary<string, Person>(persons.ToDictionary(p => p.Ssn));
bySsnDict.Add(person.Ssn, person);
bySsnDict.Remove(person.Ssn);

As mentioned at the beginning, the key can be any function over the element’s state. For example, we could sort a sequence of numbers by their absolute (unsigned) value:

var nums = new [] { 47, -32, -54, 18, 62, -71, 58 };
var byAbs = new SortedSet<int>(nums, KeyComparer.Create((int n) => Math.Abs(n)));
// result: 18, -32, 47, -54, 58, 62, -71

Implementation

Below is a simplified implementation of this comparer class:

public class KeyComparer<TSource, TKey> : Comparer<TSource>
{
    private readonly Func<TSource, TKey> _keySelector;
    private readonly IComparer<TKey> _innerComparer;

    protected internal KeyComparer(
        Func<TSource, TKey> keySelector, 
        IComparer<TKey> innerComparer = null)
    {
        _keySelector = keySelector;
        _innerComparer = innerComparer ?? Comparer<TKey>.Default;
    }

    public override int Compare(TSource x, TSource y)
    {
        if (object.ReferenceEquals(x, y))
            return 0;
        if (x == null)
            return -1;
        if (y == null)
            return 1;

        TKey xKey = _keySelector(x);
        TKey yKey = _keySelector(y);
        return _innerComparer.Compare(xKey, yKey);
    }
}

Instances of the comparer should be created through the following factory:

public static class KeyComparer
{
    public static KeyComparer<TSource, TKey> Create<TSource, TKey>(
        Func<TSource, TKey> keySelector, 
        IComparer<TKey> innerComparer = null)
    {
        return new KeyComparer<TSource, TKey>(keySelector, innerComparer);
    }
}

A polished implementation of the KeyComparer<TSource, TKey> class and its factory – refactored, XML-documented, and unit-tested – is available as part of our DogmaMix.Core library on GitHub.

Further considerations

Equality comparisons

The KeyComparer<TSource, TKey> class intentionally refrains from implementing the IEqualityComparer<TSource> interface, for reasons given hereunder.

Implementations of IEqualityComparer<T> provide a custom equality comparison, intended for use in scenarios where one needs to enforce uniqueness (or grouping) of elements, but not a sort order. For example, the HashSet<T> collection type has constructors that accept an IEqualityComparer<T> argument to use when comparing values in the set. The EqualityComparer<T> class has a Default static property, which compares objects of type T using their implementation of the IEquatable<T> interface (or, as fallback, their overrides of the Equals(object) and GetHashCode() methods).

If an element type implements both IComparable<T> and IEquatable<T> interfaces, one might be tempted to assume, for any given pair of values x and y, that x.CompareTo(y) returning 0 would be semantically equivalent to x.Equals(y) returning true. However, this is not necessarily the case. In fact, the expectation is broken by the String class in the .NET Framework Class Library, which implements IComparable<String> to perform a culture-sensitive linguistic comparison under the current culture, but implements IEquatable<String> to perform an ordinal comparison. Since Comparer<T>.Default and EqualityComparer<T>.Default respectively rely on the IComparable<T> and IEquatable<T> implementations of type T, they would also exhibit these inconsistencies, leading to anomalies if used interchangeably.

// Evaluates to 0, indicating equality:
int a = Comparer<string>.Default.Compare("café", "cafe\u0301");

// Evaluates to false, indicating inequality:
bool b = EqualityComparer<string>.Default.Equals("café", "cafe\u0301");

Our KeyComparer<TSource, TKey> class does not oblige consumers to provide an IComparer<TKey> instance for comparing the keys projected from the elements. Indeed, for most key types, consumers would omit this argument from the factory method call, and rely on KeyComparer<TSource, TKey> to internally use Comparer<TKey>.Default. For KeyComparer<TSource, TKey> to also support equality comparisons, it would also need to internally use EqualityComparer<TKey>.Default, leading to the inconsistencies discussed above.

The .NET Framework Class Library redresses the inconsistency for String by providing (and recommending) the StringComparer class, whose instances have consistent semantics:

For string comparisons, the StringComparer class is recommended over Comparer<String>. Properties of the StringComparer class return predefined instances that perform string comparisons with different combinations of culture-sensitivity and case-sensitivity. The case-sensitivity and culture-sensitivity are consistent among the members of the same StringComparer instance.

Unfortunately, there is no interface in the .NET Framework to mark comparer types that implement both IComparer<T> and IEqualityComparer<T> consistently (nor for element types that implement both IComparable<T> and IEquatable<T> consistently). The only way of implementing a KeyComparer<TSource, TKey> class that can support both sort-order comparisons and equality comparisons would be to offload the responsibility for providing a consistent IComparer<TKey> and IEqualityComparer<TKey> pair to the consumers, hindering the flexibility of the class. Rather than take this approach, we provide a separate comparer for key equality comparisons through our KeyEqualityComparer<TSource, TKey> class.

Comparer<T>.Create

The .NET Framework 4.5 introduces a Comparer<T>.Create static method. This method serves a purpose that bears some similarity to our KeyComparer<TSource, TKey> class, in that it allows custom comparers to be created through function delegates (typically lambda expressions), rather than consumers having to implement a new comparer type each time. However, the delegate type differs – our KeyComparer.Create<TSource, TKey> factory method takes Func<TSource, TKey> (just like OrderBy<TSource, TKey> in LINQ), whilst Comparer<T>.Create takes Comparison<T>. The Comparison<T> delegate is equivalent to Func<T, T, int>, and is intended to represent a method that compares two objects of the same type.

Comparer<T>.Create is more flexible than our KeyComparer<TSource, TKey>, allowing consumers to define comparisons that are more complex than mere key projections. On the other hand, it requires more work to use correctly – consumers need to manually handle null checks, project the keys from both objects, and perform the key comparison using the appropriate inner comparer. For example, the full-name comparison discussed earlier could be implemented as follows:

var fullNameComparer = Comparer<Person>.Create((x, y) =>
{
    if (object.ReferenceEquals(x, y))
        return 0;
    if (x == null)
        return -1;
    if (y == null)
        return 1;

    return StringComparer.CurrentCulture.Compare(
        $"{x.FirstName} {x.LastName}",
        $"{y.FirstName} {y.LastName}");
});

Contrast this with the simpler implementation under KeyComparer<TSource, TKey>:

var fullNameComparer = KeyComparer.Create(
    (Person p) => $"{p.FirstName} {p.LastName}", 
    StringComparer.CurrentCulture);

Thus, whilst Comparer<T>.Create obviates the need to define new comparer types, it still requires consumers to implement all the comparison logic, leaving our KeyComparer<TSource, TKey> preferable for key comparisons.