3 Things I Wish I Knew When I Started Using Entity Framework

741a3-1j8khcb_mgxi_ttvtatuqsg

Entity Framework (EF) is Microsoft's object-relational-mapping tool (ORM) that allows programmers to easily map database objects to C# object models. What attracts developers to Entity Framework is the automatic mapping of SQL tables to C# model classes, a feature that removes a lot of the tedious boiler plate code that developers would otherwise have to write.

The quickest way to see why developers love Entity Framework is with an example. Without Entity Framework, the code needed to SELECT some data from SQL looks something like this:

using (SqlConnection conn = new SqlConnection(connectionString))
{
  string query = @"SELECT 
                      * 
                    FROM
                      Customers
                    WHERE
                      TransactionDate > '2016–04–01'";

  SqlCommand cmd = new SqlCommand(query, conn);
  SqlDataReader reader = cmd.ExecuteReader();

  while (reader.Read())
  {
    //Process my rows and put them into a model object
  }
}

With Entity Framework and LINQ (Language Integrated Query), I can instead write something like this:

DatabaseContext db = new DatabaseContext();

var customers = db.Customer.Where(c => c.TransactionDate > new DateTime(20160401)).ToList();

As a developer I want to be writing code like the latter example — look at how clear and concise it is! However, when I first started using Entity Framework I quickly came across specific scenarios where the ORM was not working like I would have expected. Below are my top 3 insights about Entity Framework that I had to learn the hard way.

1. Select() is your friend

When I first started using Entity Framework I was enamored by the idea of never having to write boring SQL connection and mapping code again. During that time, I was frequently writing quick concise LINQ queries like this all over the place:

var results = db.Customer.Where(c => c.State == "Ohio");

All things were great for a while until I moved my code to production, where there was a lot more data, and my webpage started taking a very long time to return results. What was the problem?

My customer summary webpage was only displaying 4 columns from the Customer table to display summary information about customers: "State", "Name", "Address", and "Email". However, my Customer table had 50 columns in it. I foolishly thought since I was only using 4 columns from the table, Entity Framework would be smart enough to only bring those 4 columns of data back from the SQL server. Turns out I was wrong — what really was running behind the scenes was this:

SELECT
  *
FROM
  Customer
WHERE
  State = 'Connecticut'

Egad! I was running a SELECT *, which all SQL practitioners know is a big no-no both for performance as well as future maintainability of your queries!

One of the issues (features?) with Entity Framework is that by default the queries all run as SELECT *. Instead of bringing back all of the columns in a table, I figured I could limit my results by using Select() to choose which columns I wanted to return:

var results = db.Customer
                .Select(x => new 
                {
                  x.State,
                  x.Name,
                  x.Address,
                  x.Email
                });

And there was much rejoicing amongst the DBAs.

2. Lazy Loading can be great or terrible

Lazy Loading is the idea that data isn't pulled from the database until it is needed (hence the term "lazy"). This is a great feature if used correctly since it minimizes the amount of data that is transferred over the network. However, if you use lazy loading (which is often the default) unknowingly, you might be querying the SQL server more than is necessary.

Let's say our Customer and Order models looks like this, where every Customer has many associated Orders:

public class Customer
{
  [Key]
  public int Id { get; set; }
  public string State { get; set; }
  public string Name { get; set; }
  public virtual ICollection<Order> Orders { get; set; }
}

public class Order
{
  [Key]
  public int Id { get; set; }
  public int CustomerId { get; set; }

  [ForeignKey("CustomerId")]
  public virtual Customer Customer { get; set; }

  public int TotalItems { get; set; }
  public decimal TotalCost { get; set; }
}

If we write a LINQ query to bring back our Customer data and only use Customer data in our view, SQL just executes 1 query and things are efficient:

Controller:

var result = db.Customer.ToList();

View:

@foreach (var item in Model) {
  <tr>
    <td>
      @Html.DisplayFor(modelItem => item.State)
    </td>
    <td>
      @Html.DisplayFor(modelItem => item.Name)
    </td>
  </tr>
}

Result:

Bringing back fields exclusive to the Customer model.

SQL Trace:

One "SQL:BatchCompleted" means only a single query ran.

As you can see, Entity Framework only ran one query against the database and we are getting good performance. Very nice!

If we go ahead and change our View to display some Order data though:

View:

@foreach (var item in Model) {
  <tr>
    <td>
      @Html.DisplayFor(modelItem => item.State)
    </td>
    <td>
      @Html.DisplayFor(modelItem => item.Name)
    </td>
<td>
      @Html.DisplayFor(modelItem => item.Orders.First().TotalCost)
    </td>
  </tr>
}

Result:

Now EF nows to bring dollar amounts back from our Order model.

SQL Trace:

The Order query is executed once for every customer — this could result in horrible performance.

Because of Lazy Loading, entity framework runs a query to retrieve an order for each customer! In my example data with 5 Customers with 1 Order each, Entity Framework runs 6 queries: 1 query to retrieve our Customers, and then 5 queries to retrieve orders for each of customers! Imagine if we were display more than 5 customer records — our SQL server would be getting slammed with queries!

The solution to this is to Eager Load the data. This means all of the data will be brought back in our initial query via a SQL JOIN. In order to do this, we just use the Include() method to retrieve all of the data up front:

Controller:

var result = db.Customer.Include(x => x.Orders).ToList();

Trace:

EF generated SQL might be ugly, but it is efficient in this case.

Entity Framework only executes 1 query. This causes a little bit more upfront overhead since it transfers the Order data all at once, which would be a waste if we didn't use that Order data in our View, but if we do plan to use Order data in our View, then using Include() reduces the number of queries that will run against the SQL server.

3. Sometimes it's OK to not use Entity Framework LINQ queries

By default, Entity Framework generates some pretty interesting queries (interesting at least in the sense that no normal human being would structure a query like this). For example, in SQL I could write:

SELECT
  State,
  Name,
  Address,
  Email
FROM
  Customer
WHERE
  CreateDate > '2016–04–01'

And the query generated by Entity Framework might look something like this:

-- Region Parameters
DECLARE @p__linq__0 DateTime2 = '2016–04–01 00:00:00.0000000'
-- EndRegion

SELECT
  [Extent1].[State] AS [State],
  [Extent1].[Name] AS [Name],
  [Extent1].[Address] AS [Address],
  [Extent1].[Email] AS [Email]
FROM 
  [dbo].[Customer] AS [Extent1]
WHERE 
  [Extent1].[CreateDate] > @p__linq__0

Extents? Parameters? Aliases? What's all of this extra junk in my query? Although the latter SQL statement works, it isn't the prettiest on the eyes. And this is the example of simple query. Once you start writing complex LINQ queries using Entity Framework that involve the likes of Join(), GroupBy(), etc… the raw SQL that gets generated can get hairy and inefficient. In those cases, I've learned that it is perfectly to rely on good old t-sql stored procedures:

string query = "EXEC dbo.USP_NiceAndCleanSqlQuery";

db.Database.ExecuteSqlCommand(query);

At the end of the day, we are using Entity Framework to make our lives easier. If a LINQ query gets too complicated or is generating inefficient queries, it is always fine to fall back on an old standby.