Data with Bert - Entity Framework

Someone recently told me that they don't need to worry about SQL injection because they are using an ORM.

Oh boy.

ORMs don't automatically prevent SQL injection

Object-relational mapping (ORM) tools allow developers to easily access an application's data layer without having to write lots of redundant code.

Most ORMs will safely parameterize certain types of queries. The following examples use Entity Framework and SQL Server, but these examples should apply to most other major ORMs and RDBMSs).

Our LINQ query making it easy to access our data layer:

c-no-injection-1

A beautiful, clean, LINQ query

And then the SQL query that our ORM generated.

xe-profiler-orm-1

A query structure that only a mother could love

You'll notice the generated SQL query is using sp_executesql that has parameterized our input variable value "TFly37". In this instance we can say the ORM did a good job following best practices in preventing a successful SQL injection attack.

But while ORMs may prevent some SQL injection attempts, there is no guarantee that they will prevent all injection attempts.

What follows are examples of when ORMs can allow for successful injection attacks.

Programatically assembled SQL

ORMs often provide the ability for developers to map the results of adhoc SQL queries to models. Adhoc queries mean potential injection vulnerabilities.

Consider the following:

programmatic-query-1

Looks a lot like your typical SQL injection vulnerability doesn't it?

Entity Framework, or any ORM for that matter, won't be able to recognize a parameter concatenated directly into the query string.

Now hopefully the developer has some really strong input validation on the "username" parameter, but the fact still stands: this query is injectable and the ORM will happily execute it.

Stored Procedures

Does the logic needed in your app already exist as a stored procedure?

Excellent! Let's be DRY (don't-repeat-yourself) and call our procedure directly from our ORM:

parameterize-procedure

Parameterized with the value AND datatype - give this developer a raise!

So in this example, the ORM and developer have done everything right. So where's the injection vulnerability?

inejct-procedure

Uh oh...

Now let me be clear: this injection vulnerability is not the ORM's fault.

However, it is the developer's fault if he is blindly using a procedure without knowing what it does. It's also the DBA's fault if she provided an injectable procedure assuming the developer would sanitize any input values in the app.

Security is hard. Everyone needs to work together and be responsible for doing everything within their means to secure their applications.

ORM injection

Technically this isn't an example of SQL injection.

But that's why the title of this post says "2.5" instead of "3".

In this example I'm using a dynamic LINQ to access my data:

linq-injectino

Hello concatenation my old friend

If we pass the value " OR 1 == 1 OR UserName== we will have the ORM convert it to the following query:

linq-injection

Game over

Injection takes many forms and doesn't just come straight from SQL. It's important to follow best practices in all languages and tools to prevent security incidents.

Want to learn more?

If you are interested in learning more about how to protect yourself from SQL injection, be sure to watch my online session at GroupBy at 9am Eastern on March 16th, 2018.

741a3-1j8khcb_mgxi_ttvtatuqsg

Entity Framework (EF) is Microsoft's object-relational-mapping tool (ORM) that allows programmers to easily map database objects to C# object models. What attracts developers to Entity Framework is the automatic mapping of SQL tables to C# model classes, a feature that removes a lot of the tedious boiler plate code that developers would otherwise have to write.

The quickest way to see why developers love Entity Framework is with an example. Without Entity Framework, the code needed to SELECT some data from SQL looks something like this:

using (SqlConnection conn = new SqlConnection(connectionString))
{
  string query = @"SELECT 
                      * 
                    FROM
                      Customers
                    WHERE
                      TransactionDate > '2016–04–01'";

  SqlCommand cmd = new SqlCommand(query, conn);
  SqlDataReader reader = cmd.ExecuteReader();

  while (reader.Read())
  {
    //Process my rows and put them into a model object
  }
}

With Entity Framework and LINQ (Language Integrated Query), I can instead write something like this:

DatabaseContext db = new DatabaseContext();

var customers = db.Customer.Where(c => c.TransactionDate > new DateTime(2016–04–01)).ToList();

As a developer I want to be writing code like the latter example — look at how clear and concise it is! However, when I first started using Entity Framework I quickly came across specific scenarios where the ORM was not working like I would have expected. Below are my top 3 insights about Entity Framework that I had to learn the hard way.

1. Select() is your friend

When I first started using Entity Framework I was enamored by the idea of never having to write boring SQL connection and mapping code again. During that time, I was frequently writing quick concise LINQ queries like this all over the place:

var results = db.Customer.Where(c => c.State == "Ohio");

All things were great for a while until I moved my code to production, where there was a lot more data, and my webpage started taking a very long time to return results. What was the problem?

My customer summary webpage was only displaying 4 columns from the Customer table to display summary information about customers: "State", "Name", "Address", and "Email". However, my Customer table had 50 columns in it. I foolishly thought since I was only using 4 columns from the table, Entity Framework would be smart enough to only bring those 4 columns of data back from the SQL server. Turns out I was wrong — what really was running behind the scenes was this:

SELECT
  *
FROM
  Customer
WHERE
  State = 'Connecticut'

Egad! I was running a SELECT *, which all SQL practitioners know is a big no-no both for performance as well as future maintainability of your queries!

One of the issues (features?) with Entity Framework is that by default the queries all run as SELECT *. Instead of bringing back all of the columns in a table, I figured I could limit my results by using Select() to choose which columns I wanted to return:

var results = db.Customer
                .Select(x => new 
                {
                  x.State,
                  x.Name,
                  x.Address,
                  x.Email
                });

And there was much rejoicing amongst the DBAs.

2. Lazy Loading can be great or terrible

Lazy Loading is the idea that data isn't pulled from the database until it is needed (hence the term "lazy"). This is a great feature if used correctly since it minimizes the amount of data that is transferred over the network. However, if you use lazy loading (which is often the default) unknowingly, you might be querying the SQL server more than is necessary.

Let's say our Customer and Order models looks like this, where every Customer has many associated Orders:

public class Customer
{
  [Key]
  public int Id { get; set; }
  public string State { get; set; }
  public string Name { get; set; }
  public virtual ICollection<Order> Orders { get; set; }
}

public class Order
{
  [Key]
  public int Id { get; set; }
  public int CustomerId { get; set; }

  [ForeignKey("CustomerId")]
  public virtual Customer Customer { get; set; }

  public int TotalItems { get; set; }
  public decimal TotalCost { get; set; }
}

If we write a LINQ query to bring back our Customer data and only use Customer data in our view, SQL just executes 1 query and things are efficient:

Controller:

var result = db.Customer.ToList();

View:

@foreach (var item in Model) {
  <tr>
    <td>
      @Html.DisplayFor(modelItem => item.State)
    </td>
    <td>
      @Html.DisplayFor(modelItem => item.Name)
    </td>
  </tr>
}

Result:

Bringing back fields exclusive to the Customer model.

SQL Trace:

One "SQL:BatchCompleted" means only a single query ran.

As you can see, Entity Framework only ran one query against the database and we are getting good performance. Very nice!

If we go ahead and change our View to display some Order data though:

View:

@foreach (var item in Model) {
  <tr>
    <td>
      @Html.DisplayFor(modelItem => item.State)
    </td>
    <td>
      @Html.DisplayFor(modelItem => item.Name)
    </td>
<td>
      @Html.DisplayFor(modelItem => item.Orders.First().TotalCost)
    </td>
  </tr>
}

Result:

Now EF nows to bring dollar amounts back from our Order model.

SQL Trace:

The Order query is executed once for every customer — this could result in horrible performance.

Because of Lazy Loading, entity framework runs a query to retrieve an order for each customer! In my example data with 5 Customers with 1 Order each, Entity Framework runs 6 queries: 1 query to retrieve our Customers, and then 5 queries to retrieve orders for each of customers! Imagine if we were display more than 5 customer records — our SQL server would be getting slammed with queries!

The solution to this is to Eager Load the data. This means all of the data will be brought back in our initial query via a SQL JOIN. In order to do this, we just use the Include() method to retrieve all of the data up front:

Controller:

var result = db.Customer.Include(x => x.Orders).ToList();

Trace:

EF generated SQL might be ugly, but it is efficient in this case.

Entity Framework only executes 1 query. This causes a little bit more upfront overhead since it transfers the Order data all at once, which would be a waste if we didn't use that Order data in our View, but if we do plan to use Order data in our View, then using Include() reduces the number of queries that will run against the SQL server.

3. Sometimes it's OK to not use Entity Framework LINQ queries

By default, Entity Framework generates some pretty interesting queries (interesting at least in the sense that no normal human being would structure a query like this). For example, in SQL I could write:

SELECT
  State,
  Name,
  Address,
  Email
FROM
  Customer
WHERE
  CreateDate > '2016–04–01'

And the query generated by Entity Framework might look something like this:

-- Region Parameters
DECLARE @p__linq__0 DateTime2 = '2016–04–01 00:00:00.0000000'
-- EndRegion

SELECT
  [Extent1].[State] AS [State],
  [Extent1].[Name] AS [Name],
  [Extent1].[Address] AS [Address],
  [Extent1].[Email] AS [Email]
FROM 
  [dbo].[Customer] AS [Extent1]
WHERE 
  [Extent1].[CreateDate] > @p__linq__0

Extents? Parameters? Aliases? What's all of this extra junk in my query? Although the latter SQL statement works, it isn't the prettiest on the eyes. And this is the example of simple query. Once you start writing complex LINQ queries using Entity Framework that involve the likes of Join(), GroupBy(), etc… the raw SQL that gets generated can get hairy and inefficient. In those cases, I've learned that it is perfectly to rely on good old t-sql stored procedures:

string query = "EXEC dbo.USP_NiceAndCleanSqlQuery";

db.Database.ExecuteSqlCommand(query);

At the end of the day, we are using Entity Framework to make our lives easier. If a LINQ query gets too complicated or is generating inefficient queries, it is always fine to fall back on an old standby.

2.5 Ways Your ORM Is Vulnerable To SQL Injection