Thursday, June 11, 2015

.NET Collections & deferred execution with IQueryable

One of the advantages of modern programming languages and frameworks is that they provide a lot of functionality to "make it easier" for developers to build their software. However, in most cases it is still very important to know what happens under the hood. For example, it will be easier for a developer to analyse the impact of changes, anticipate on how 3rd party libraries might work, or optimize their code properly.

In this blogpost, I will try to explain the differences between the various collection interfaces supplied by the .NET Framework, and their uses. I will also give an example how improper knowledge of this subject can lead to problems in the case of IQueryable. A little bit of background information and theory is explained to understand some of the decisions made regarding collections.


Iterators

In object-oriented programming, it is common to use so-called "design patterns". I will not go in-depth on the concept of design patterns, but they are basically patterns which can be used by developers to create understandable and structured code. Click this link to learn more about them.

An important pattern to understand for this blogpost is the "iterator pattern". The definition of the iterator pattern according to Gang of Four:
Provide a way to access the elements of an aggregate object sequentially without exposing its underlying representation.
Below is a class diagram of the traditional iterator pattern. The most important aspect of this diagram is the Iterator object. By using the Next() method, you can sequentially iterate over the collection, and thereby fetch the next object.

Iterator pattern
If you have the feeling you need more information about this pattern, make sure to do that now.

IEnumerable

According to MSDN, the IEnumerable interface:
Exposes an enumerator, which supports a simple iteration over a non-generic collection.

The IEnumerable interface is the core of all collections in .NET. It is part of the .NET implementation of the iterator pattern as explained above. This also allows the use of the foreach keyword. As we all know, the foreach keyword is used to iterate over a collection. 

As written earlier, it is very important to note that the iterator pattern hides the underlying representation of the collection. This makes it very easy to forget what is actually happening under the hood. We will come back to this later in this post.

ICollection

According to MSDN, the ICollection interface:
Defines size, enumerators, and synchronization methods for all nongeneric collections.

The ICollection interface inherits from the IEnumerable interface. It extends the IEnumerable by simple functionality, like the ability to get the size of a collection and synchronizing/locking features.

By inheriting the IEnumerable interface, it enables you to iterate over an object that implements the ICollection interface (using the foreach keyword).

IList

According to MSDN, the IList interface:
Represents a non-generic collection of objects that can be individually accessed by index.

As you can see, the IList interface provides methods to manipulate the collection. It also allows the developer to index the collection (as defined by MSDN).

Concrete implementations are familiar for this interface:
  • Array (also int[], etc.)
  • List
  • LinkedList
  • ArrayList
Again, since this interface inherits from IEnumerable, it becomes possible to iterate these objects.

Potential problem with IQueryable

According to MSDN, the IQueryable interface:
Provides functionality to evaluate queries against a specific data source wherein the type of the data is not specified.
This interface is, semantically speaking, not an actual "collection" interface. However, it inherits from the IEnumerable interface. And this is a perfect example where improper knowledge of the above can cause a pretty bad situation.

First, a little background information on the IQueryable interface:

IQueryable?

Since the rise of ORM tools (Object-relational mapping), it has become "easier" for developers to interact with databases or datasets. Instead of writing queries, it has become possible to use a code-based approach, and let a query generator create the actual queries for you. This makes code incredibly portable, since you don't have to write the queries for each different database engine.

Some ORM tools will expose this functionality using the IQueryable interface. Which in itself is a great idea. It provides a single interface with which you can interact to dynamically build database/dataset queries, using Linq for example.

But the IQueryable interface inherits from the IEnumerable interface. This allows us to iterate the object, which is something we all want to do at some point. However, in most cases, every time you use the foreach-keyword on a this object, you execute a query on the database. That means that if you iterate the same object twice, the same query is executed twice.

That of course, has a reason. Because if you perform an action on a collection, modify the dataset, and then perform the same action again, you might expect the results to be different than the first time.

The point is, some developers may regard the IQueryable object as a collection that is, at a certain point, a filled collection and is non-volatile. Therefore this may lead to unintened consequences such as too much load on a database server.

A commonly made mistake is to pass references to the IQueryable objects instead of retrieving the required results using for example ToList() or ToArray() and storing the results in memory.

Code Example

Because I know you like code examples! In this one I created my own simple implementation of IEnumerable & IEnumerator wrapped around an array so we can see what happens.


And this is the output:


As you can see, when iterating the object the second time, the index is reset to 0 and MoveNext() is called 10 times again.

No comments:

Post a Comment