.NET and duck-typing, part 2, In which Things Are Enumerated
Davyd McColl
Posted on October 28, 2019
In a previous post, I introduced the concepts of duck-typing and demonstrated how, even in a strongly-typed ecosystem like .net, there's some duck-typing happening under-the-covers; most specifically, during compilation time, from a C# point-of-view (though something similar is most likely implemented for other languages targeting .net -- if it wasn't, there would be trouble...)
I want to push this just a little further:
Inbuilt .net types which already take advantage of duck-typed enumeration
One of the scenarios which consistently annoys me on this front is the usage of Regex
and Matches
. Let's take a look:
var someString = "Hello, world!";
var regex1 = new Regex("moo-cow");
var regex2 = new Regex("^Hello, (.+)$");
var firstMatches = regex1.Matches(someString);
Console.WriteLine($"matches for moo-cow: {firstMatches.Count}");
var secondMatches = regex2.Matches(someString);
Console.WriteLine($"matches for saying hello: {secondMatches.Count}");
If all is right with the universe, the above prints out:
matches for moo-cow: 0
matches for saying hello: 1
Which is all good and well, but the point of the second regex is that I kinda wanted to figure out who we're saying hello to. So we can use this ugly code:
var skip = true;
foreach (Match match in secondMatches)
foreach (Group group in match.Groups)
foreach (Capture capture in group.Captures)
{
if (skip)
{
skip = false;
continue;
}
Console.WriteLine(capture.Value);
}
I've omitted some braces that I would have normally left in to keep it a little shorter. This code is less than ideal:
- We have to manually skip the first match because .net regexes include the overall match as one of the groups (side-note: if you know of a way to avoid this, let me know. I haven't figured one out yet -- even
(?:)
around the outer string doesn't omit it from the result) - We had to actually know the type of values that we could enumerate. Try for yourself to update the above to use
var
for eachforeach
variable. You'll quickly see that, because these three structures use the approach explained in the prior post (GetEnumerator
returns anEnumerator
whereCurrent
is of typeobject
), the compiler can't tell what we want to do with the enumerated objects. We have to specifically tell it the type of the enumeration value, and that means that we have to specifically know what type will come back. Which also means that library code (eg the provider forRegex
) can never change that type! - 3
foreach
statements -- this is starting to feel like something we should be doing with LINQ. What if we could rather write something like:
var who = secondMatches.AsEnumerable<Match>()
.SelectMany(m => m.Groups.AsEnumerable<Group>())
.SelectMany(g => g.Captures.AsEnumerable<Capture>())
.Skip(1)
.FirstOrDefault()?.Value;
Console.WriteLine($"We said hello to: {who}");
It turns out, you can, because LINQ is pretty cool. But it would be interesting to figure this out for ourselves.
First-pass: yield
We might be tempted to solve this quite simply with:
public static class ToEnumerable<T>(this object o)
{
foreach (var item in o)
{
return (T)item;
}
}
But there's a rather large problem here: having told the compiler that we'd like to apply this to potentially every type out there, it has no idea that the type we're trying to deal with has the fancy GetEnumerator
method. So we have to help. We could make a decorator class which uses reflection to do the heavy lifting and presents the implicit (compile-time) interface that the compiler expects:
public class EnumerableDecorator<T1>
{
public class Enumerator<T2>
{
private readonly Func<T2> _fetchCurrent;
private readonly Func<bool> _moveNext;
private readonly Action _reset;
public Enumerator(
Func<T2> fetchCurrent,
Func<bool> moveNext,
Action reset)
{
_fetchCurrent = fetchCurrent;
_moveNext = moveNext;
_reset = reset;
}
public T2 Current => _fetchCurrent();
public bool MoveNext()
{
return _moveNext();
}
public void Reset()
{
_reset();
}
}
private object _wrapped;
private MethodInfo _getEnumeratorMethod;
public EnumerableDecorator(object o)
{
_wrapped = o;
_getEnumeratorMethod = o.GetType().GetMethod("GetEnumerator");
}
public Enumerator<T1> GetEnumerator()
{
var enumerator = _getEnumeratorMethod.Invoke(_wrapped, NoArgs);
var enumeratorType = enumerator.GetType();
var moveNextMethod = enumeratorType.GetMethod("MoveNext");
var resetMethod = enumeratorType.GetMethod("Reset");
var currentProp = enumeratorType.GetProperty("Current");
return new Enumerator<T1>(fetchCurrent, moveNext, reset);
bool moveNext()
{
return (bool)moveNextMethod.Invoke(enumerator, NoArgs);
}
void reset()
{
resetMethod.Invoke(enumerator, NoArgs);
}
T1 fetchCurrent()
{
return (T1)currentProp.GetValue(enumerator);
}
}
private static readonly object[] NoArgs = new object[0];
}
(There are plenty of ways this code could be better -- this is just the simplest code to accomplish the task at hand with the optimistic expectation that no-one expects this to handle objects which don't implement the implicit enumerable interface)
Applying this to our original LINQ code:
var who = secondMatches.ToEnumerable<Match>()
.SelectMany(m => m.Groups.ToEnumerable<Group>())
.SelectMany(g => g.Captures.ToEnumerable<Capture>())
.Skip(1)
.FirstOrDefault()?.Value;
Console.WriteLine($"We said hello to: {who}");
... which works the same.
Why bother?
Fair enough. LINQ already does this for us, though older versions of LINQ may not have supported this. More importantly, when we try to implement something built-in, we get to understand more of the underlying nuts-and-bolts. When there's less "magic" in how things work, we can make better choices about how to use those things and we can use that knowlege to further our other programming needs.
Is this duck-typing?
Yes! We've used a decorator to provide an implicit interface to a collection that the compiler understands, allowing us to take a pure object with the correct shape and get a collection we can operate on with functional methods. The extension method ToEnumerable
duck-types any object with the correct shape to be enumerable. Of course, it's lacking in that it doesn't give good errors when that duck-typing fails. For example:
foreach (var item in new EnumerableDecorator(123))
{
// will explode at runtime with a NullReference exception,
// because `GetMethod` and `GetProperty` can return `null`
// when the requested members aren't found
}
It also doesn't check that the totally correct "shape" exists on the wrapped object:
- GetEnumerator should return an object with members:
-
Current
(object
or higher) -
MoveNext()
(returnsbool
) -
Reset()
(returnsvoid
) But that's just validation which I'll leave to you to implement.
-
Next up: ToEnumerable<T>
should return an object that does the full job of wrapping & providing enumeration instead of doing the enumeration itself -- we're going to make EnumerableDecorator
actually implement IEnumerable<T>
. You may ask why?
- Simpler logic: the enumeration should happen encapsulated in the decorator object so that the extension method is only responsible for performing the wrapping. We may have use for this in other places where, perhaps we'd prefer not to use the extension method (for whatever reason). Also, it's just good separation of concerns.
- It's good to consider inbuilt interfaces when working with your own classes: if you're making library code, people already know how to deal with
IEnumerable<T>
-- they know what to expect from it, and they don't really have to think about how things are done. One of the most useful interfaces in .net (imo) isIDictionary<TKey, TValue>
, most especiallyIDictionary<string, object>
, which provides an interface similar to what we'd expect from regular old JavaScript objects. I'll be returning to that interface later -- there comes a time when it becomes pivotal to my particular implementation of a generic duck-typer.
Posted on October 28, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.