What's hot ? (and I mean really ...) - scroll down for more
1).  Code Templating - advanced usage of delegates & generics: my slides & demos are available for download! CodeProject article is also available.

2).  My series "TDD in the eyes of a simpleminded" is in progress(including code!): preface, part1, part2, Q&A 1, Manual Stub .vs. Mock Stub

3).  TDD Workshop: SeeCompass v0.1 and v0.2 are out.
# Sunday, October 19, 2008

How many of you played with C# yield statement ? I guess that most of you did.
Anyway, like most of the MSDN examples out there, when used incorrectly, it could introduce very bad behavior to your code. Consider the following:

public IEnumerable<User> GetUsers(int count)
{
    using (MysqlConnection connection = new MysqlConnection("..."))
    {
        // MysqlDataReader reader = create an MysqlCommand and execute it
        while (reader.Read())
            yield return new User(/*... fill parameters from the reader ... */);
    }
}

Looks pretty harmless right? Not quite. The yield statement is actually transformed to a "state machine" which means that every time we yield a result back to the client (the caller of GetUsers in our example), we wait for the client to call to the next item (via IEnumerator<T>.MoveNext()). The code above will hold the connection open until the client done iterating all of the User items. This will lead into major scalability issues very quickly! You should always keep your DB connections open for short period to prevent connection exhaustion (threads waiting for available DB connection in the pool for long period, until timeout). Because yield returns the control to the caller, it might be that the caller will "take his time" thus leading to connection exhaustion.


On the other hand, used wisely and yield yields (lame joke, sorry) HUGE benefits:
[note: code written in notepad, stupidity won't compile]

  • Avoid useless memory allocations

#1:
How many times you end up creating something like this:

    public List<T> Filter(List<T> input, Predicate<T> predicate)
    {
       List<T> output = new List<T>(input.Count /2);
       foreach (T item in input)
          if (predicate(item))
             output.Add(item);

       return output;
   }

We allocate much more memory than we need only to hold the output during the calculation. A better approach will be:

    public IEnumerator<T> Filter(IEnumerator<T> input, Predicate<T> predicate)
    {
       foreach (T item in input)
          if (predicate(item))
             yield return item;
   }

This way we allocate only one T at a time (will be saved in the generated state machine). In addition, the client could choose to send each item via yield as well, thus saving the need to create the "input" before calling our Filter method.


#2:

Another oh-(gosh-why)-so-common example is the following:

public void Save(T item)
{
    Save(new T[] { item });
}

public void Save(ICollection<T> items)
{
    // do your magic here to save items
}

Assuming you call Save with a single item quite a lot, you're allocating A LOT of memory to create one-item arrays. A better approach will be:

public void Save(T item)
{
    Save(Yield.One(item));
}

public void Save(IEnumerator<T> items)
{
    // do your magic here to save items
}

public static class Yield
{
    public static IEnumerator<T> One<T>(T item)
    {
        yield item;
    }
}

~Zero memory allocation here.

  • Avoid "impossible" memory allocation

Let's say you want to read a 20G file with emails where every line holds a single email. Trying to declare a List<string> and filling it up will make your memory blow up obviously. You simply can't hold that much in memory. Instead, you can use Stream.ReadLine and yield back each row to your client, until all of the emails are taken care of. Yes, you can try to read the file in chunks (keeping a pointer), but this is exactly what yield does under the hood. Reminder: DRY principle is gold (or Don't Repeat .Net Framework, in our case).

  • Execute synchronous code asynchronously (nicely achieved via CCR)

A bit advanced, but you can read all about it here. The great benefit is you can transform (almost) any "yield based" code to run async, if needed/wanted.


Recap:
Prefer using the yield statement as long as calculating values doesn't require holding an expensive resource like a DB connection, or a FileHandler for long period of time.

Posted by Oren Ellenbogen 
19/10/2008 12:35, Israel time UTC-07:00,     Comments [3]  | 
Sunday, October 19, 2008 12:48:49 PM (Jerusalem Standard Time, UTC+02:00)
Re #2: since each iterator is in practice a generated nested class (the aforementioned state machine), I doubt this will have any significant impact on memory allocation. I don't see why allocating a class instance (simple though it is) is any different from allocating an intermediate array. As an aside, the GC is optimized for precisely such usage scenarios (short-lived temporary variables), so although there's obviously a cost associated with the allocation it's probably considerably lower than a malloc/free combination.

Monday, October 27, 2008 2:46:09 PM (Jerusalem Standard Time, UTC+02:00)
My experience shows that the yield statement may introduce hard-to-find bugs and would avoid using it in 99% of the cases in any project with more than one developer on it.
Sunday, November 16, 2008 10:26:35 PM (Jerusalem Standard Time, UTC+02:00)
@Tomer - You're correct, in most casesIf you try to to create millions of calls you'll see noticeable difference. Because not many use that amount of calls and don't require that optimization level, I agree with you. This might be not the best example and yet it's an interesting concept of usage, so I think it's still relevant to understand.

@Pasha - so does writing multi-threaded code and yet avoiding it is not an option for some (I say most) applications. A rule of "avoid it 99% of the time if you're not the only developer" is not really a good argument. Would love to discuss about actual examples you've got.
Comments are closed.