Reducing Code First Database Chatter

Code First has automatic functionality built in to help you get up and running with a database and warn you when you may have made a mistake that will cause your app to fail. This is generally helpful when you are developing an application – especially when you are new to EF – but when it comes time to deploy it’s just extra logic and database round trips that you don’t need to run.

This logic only happens once per AppDomain for each context type, so the overhead is usually pretty minimal and turning off the features is more of an academic correctness thing than something that will actually give you a noticeable boost in performance.

 

This post is about “how it is” not “how it should be”

The magic that Code First does around database initialization is the cause of much confusion and questions. It’s also a complex part of our code base that is brittle and we have often broken things unintentionally. In addition, while some of the magic is good for local development, it doesn’t set you up with a good solution for database management during deployment. For these reasons, we have some changes planned for this area in EF7. You can read more about our plans for EF7 on our team blog.

You’ll also notice that some of the queries we send could be removed/optimized for the common case where the database is created and up to date. We’ve made some improvements in this area, but there are still things we could do better.

 

What happens by default

The following table shows the database interactions that occur from running the code listed in ‘The Code’ section at the end of this post.

Event# EventClass TextData
1 SQL:BatchCompleted
select serverproperty('EngineEdition')

 

2 SQL:BatchCompleted
SELECT Count(*) FROM sys.databases WHERE [name]=N'BloggingSample.BloggingContext'

 

3 SQL:BatchCompleted
SELECT Count(*)
FROM INFORMATION_SCHEMA.TABLES AS t
WHERE t.TABLE_SCHEMA + '.' + t.TABLE_NAME IN ('dbo.Blogs')
    OR t.TABLE_NAME = 'EdmMetadata'

 

4 RPC:Completed
exec sp_executesql N'SELECT 
    [GroupBy1].[A1] AS [C1]
    FROM ( SELECT 
        COUNT(1) AS [A1]
        FROM [dbo].[__MigrationHistory] AS [Extent1]
        WHERE [Extent1].[ContextKey] = @p__linq__0
    )  AS [GroupBy1]'

 

5 RPC:Completed
exec sp_executesql N'SELECT TOP (1) 
    [Project1].[C1] AS [C1], 
    [Project1].[MigrationId] AS [MigrationId], 
    [Project1].[Model] AS [Model], 
    [Project1].[ProductVersion] AS [ProductVersion]
    FROM ( SELECT 
        [Extent1].[MigrationId] AS [MigrationId], 
        [Extent1].[Model] AS [Model], 
        [Extent1].[ProductVersion] AS [ProductVersion], 
        1 AS [C1]
        FROM [dbo].[__MigrationHistory] AS [Extent1]
        WHERE [Extent1].[ContextKey] = @p__linq__0
    )  AS [Project1]
    ORDER BY [Project1].[MigrationId] DESC'

 

6 SQL:BatchCompleted
SELECT TOP (1) 
    [c].[BlogId] AS [BlogId], 
    [c].[Url] AS [Url]
    FROM [dbo].[Blogs] AS [c]

 

7 SQL:BatchCompleted
SELECT TOP (1) 
    [c].[BlogId] AS [BlogId], 
    [c].[Url] AS [Url]
    FROM [dbo].[Blogs] AS [c]

 

Events 1 thru 5 are run the first time a context is used to perform an operation in the AppDomain. In our scenario they are triggered by the LINQ query that is run on Line 12 of the code listing.

  • Event #1: The logic that builds your Code First model needs to know what version of the database you are targeting. Different versions of SQL Server support different features and datatypes, so Code First needs to know what you are targeting in order to build a compatible model.Note: Needing to know what version of the database server you are targeting is a result of how EF is implemented. In future versions of EF we will be removing this requirement.
  • Event #2: Code First has logic to create the database for you if it doesn’t exist, so the first thing it does is check if the database is present. In our scenario the database already existed, but if it didn’t we would see some following commands to create the database and tables in place of Events 3 thru 5.
  • Event #3:  Because we have an existing database, EF will now check if the database has been initialized with any of the tables from our model. We do this because you may have pointed Code First to an existing database with no tables or with tables from another model (e.g. ASP.NET Identity). In our scenario the tables already exist, so Code First will move on to compatibility checks.Note: The EdmMetadata table you see included in this query is a legacy concept that Code First used before Migrations was enabled. This would only ever be present in databases created with EF 4.1 and 4.2.
  • Event #4: Next, EF checks if the __MigrationHistory table exists and has data in it for our context. If this table is present then EF can check if the model has changed since the last migration was applied to the database.
  • Event #5: Finally, since the __MigrationHistory table will have data for our context, it gets the most recent entry to check if changes have been made to the model after the last migration was applied. It does this using the model snapshot that is stored in the Model column for each entry in the __MigrationHistory table. If the model and database are out of sync, EF will throw an exception informing you to use migrations to apply the changes to the database.

Events 6 and 7 are specific to our application code.

  • Event #6: This is the database query that results from the LINQ query to fetch a blog from the database.
  • Event #7: Because the database initialization only runs once per AppDomain, this second query only results in the corresponding database query, even though it is run on a new instance of our context type.

 

Switching off database initialization (eliminate event #2 thru #5)

If I’m targeting an existing database (not created by Code First) or if I know what I’m doing with Code First Migrations then I’m going to make sure that my database is always created and up-to-date. For this reason I can switch off the default database initialization logic (thus removing Event #2 thru Event #5).

To do this, I register a null database initializer for my context. The easiest way to do this is using Code-Based Configuration – including a class deriving from DbConfiguration in the same assembly as my context.

using System.Data.Entity;

namespace Sample
{
    public class MyConfig : DbConfiguration
    {
        public MyConfig()
        {
            SetDatabaseInitializer<BloggingContext>(null);
        }
    }
}

 

Alternatively, rather than setting the initializer to null, I can assign an instance of NullDatabaseInitializer. Both of these achieve exactly the same thing.

SetDatabaseInitializer<BloggingContext>(new NullDatabaseInitializer<BloggingContext>());

 

You can also configure database initializers from your App/Web.config file. For more info, see Entity Framework Config File Settings.

 

Avoiding the database version query (eliminate event #1)

There is no way to determine the version of SQL Server being targeted from a DbConnection. This is why EF generates Event #1 to determine the version of SQL Server being targeted. If I know the version, I can short circuit this though and return a hard-coded value, thus eliminating Event #1.

The version is known as a ‘provider manifest token’ and is retrieved through an IManifestTokenResolver service. I can register my own implementation to just return a hard coded value for the servers I know about. In my case, I know that .\SQLEXPRESS is 2012.

using System.Data.Common;
using System.Data.Entity;
using System.Data.Entity.Infrastructure;
using System.Data.SqlClient;

namespace BloggingSample
{
    public class MyConfig : DbConfiguration
    {
        public MyConfig()
        {
            SetDatabaseInitializer<BloggingContext>(null);
            SetManifestTokenResolver(new MyManifestTokenResolver());
        }

        public class MyManifestTokenResolver : IManifestTokenResolver
        {
            private readonly IManifestTokenResolver _defaultResolver = new DefaultManifestTokenResolver();

            public string ResolveManifestToken(DbConnection connection)
            {
                var sqlConn = connection as SqlConnection;
                if(sqlConn != null && sqlConn.DataSource == @".\SQLEXPRESS")
                {
                    return "2012";
                }
                else
                {
                    return _defaultResolver.ResolveManifestToken(connection);
                }
            }
        }
    }
}

We can make our solution a little more generic because for most cases you can safely return 2008 for all SQL Server instances, rather than special casing each server.

public string ResolveManifestToken(DbConnection connection)
{
    var sqlConn = connection as SqlConnection;
    if(sqlConn != null)
    {
        return "2008";
    }
    else
    {
        return _defaultResolver.ResolveManifestToken(connection);
    }
}

 

The Code

Here is the program that was run to get a profile of the database interaction events that occur.

using System.Data.Entity;
using System.Linq;

namespace BloggingSample
{
    class Program
    {
        static void Main(string[] args)
        {
            using (var db = new BloggingContext())
            {
                var blog = db.Blogs.FirstOrDefault();
            }

            using (var db = new BloggingContext())
            {
                var blog = db.Blogs.FirstOrDefault();
            }
        }
    }

    public class BloggingContext : DbContext
    {
        public DbSet<Blog> Blogs { get; set; }
    }

    public class Blog
    {
        public int BlogId { get; set; }
        public string Url { get; set; }
    }
}