EF CTP4 Tips & Tricks: Mapping to an Existing Database

This is the third in a series of posts about the recently released Entity Framework Feature CTP4 (AKA “EF Magic Unicorn Edition”).

Most of the samples/demos of Code First so far have focused on creating a new database that is generated from a set of classes, however Code First also can be used in scenarios where you have an existing database.

For the majority of folks the existing designer/edmx based reverse engineering process (often referred to as Database First) already provides a good story for generating a model based on an existing database, with the option to tweak the object model and mapping. There are however some folks who would rather specify mapping using code than a designer.

You can download the completed VS2010 project from this post.

The Schema

Let’s start by looking at the schema we are going to map to, I’ve intentionally used table/column naming that makes it clear what is happening when we start mapping in code… I’m definitely not implying it’s good database design to post-fix table names with “Table” Smile

The schema is pretty straightforward, there are a couple of things to call out:

PostTagsTable represents a many to many relationship between Tags and Posts
UserTable is used to store two types of Users
- Normal Users will have a value of “U” in the user_type column and should have ‘null’ in the admin_from column
- Administrators will have a value of “A” in the user_type column and will have a valid date value in the admin_from column

The Object Model

Let’s define a set of classes that we are going to use EF to map to. We want our classes to be completely independent of whatever data access technology we are using so I’m just writing my own POCO classes.

The object model is pretty self explanatory, a couple of points:

For the Post->Blog relationship we are not exposing a foreign key property on Post but for the Post->User relationship we are (Post.AuthorId)
(This is purely to show the difference in mapping, I’d recommend being consistent one way or the other throughout your model)
For the Post->User relationship we are only exposing a navigation property on one side (i.e. there is a Post.Author but no User.Posts)
There is no class corresponding to the PostTagsTable, just collection properties on the Post and Tag classes to represent the relationship
Although Users and Admins are stored in the same table in the database we have split them out into an inheritance hierarchy in the object model

public class Blog
{
    public int BlogId { get; set; }
    public string Name { get; set; }

    public ICollection<Post> Posts { get; set; }
}

public class Post
{
    public int PostId { get; set; }
    public string Title { get; set; }
    public string Extract { get; set; }
    public string Body { get; set; }

    public Blog Blog { get; set; }

    public ICollection<Tag> Tags { get; set; }

    public int AuthorId { get; set; }
    public User Author { get; set; }
}

public class Tag
{
    public string TagText { get; set; }

    public ICollection<Post> Posts { get; set; }
}

public class User
{
    public int UserId { get; set; }
    public string UserName { get; set; }
}

public class Admin : User
{
    public DateTime AdminFrom { get; set; }
}

The Context

Before we dive into mapping, we need to define a context to use for interaction with the database.

A couple of points to note:

I’m exposing a DbConnection based constructor just to keep everything in code for this post
The easier alternative is to use the default constructor and add a connection string in your app/web.config file with the name “BlogContext”EF will match the app/web.config entry with the context name at runtime and use it
I’m overriding the protected OnModelCreating method which gives us the opportunity to tweak the model that is built via convention, this method is where we will add our mapping code

public class BlogContext : DbContext
{
    public BlogContext(DbConnection connection)
        : base(connection)
    { }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        // TODO: Insert mapping here
    }

    public DbSet<Blog> Blogs { get; set; }
    public DbSet<Post> Posts { get; set; }
    public DbSet<Tag> Tags { get; set; }
    public DbSet<User> Users { get; set; }
    public DbSet<Admin> Admins { get; set; }
}

Switch Off Database Initialization

DbContext has some smarts built in to try and help you out with managing database initialization, because we are taking care of managing the database schema we should turn that off.

The initialization strategy is an AppDomain wide setting for each context type so this code should execute prior to constructing any instances of the context in your application:

Database.SetInitializer<BlogContext>(null);

Mapping

If we tried to use our context for data access now we’d get a bunch of exceptions because the database we defined at the start doesn’t match what EF is expecting. For example Code First has used a convention to decide that the table for Blog should be called Blogs, which is fine if we let Code First create the database but not so good if we already have a table called BlogTable. In fact the first exception we get would tell us the Code First doesn’t know what the primary key property is for Tag because it doesn’t follow the Id or <ClassName>Id convention.

There is however a bunch of stuff that the conventions will correctly take care of for us, like detecting that Blog.BlogId is a primary key and set as an Identity column in the database. Let’s take a look at the mapping code we would add in the OnModelCreating method for each of our five classes.

Blog

Blog is pretty simple to configure, first we are specifying that Name is a required property in the database and has a maximum length of 50.

modelBuilder.Entity<Blog>()
    .Property(b => b.Name)
    .IsRequired()
    .HasMaxLength(50);

Secondly we are specifying which table it maps to and how the properties map to the columns. Notice the anonymous type projection syntax for specifying which properties are mapped to the database and what their respective column names are. On the upside this syntax is concise and reads pretty well if you look at the code, on the down side it isn’t very discoverable when you look at the method signature for MapSingleType; public EntityMap MapSingleType(Expression<Func<TEntity, object>> propertyMap). Any feedback or comments you have on this syntax will help us shape the Fluent API in the future.

modelBuilder.Entity<Blog>()
    .MapSingleType(b => new { blog_id = b.BlogId, name = b.Name })
    .ToTable("BlogTable");

Post

Post is probably the most interesting mapping, we have a few properties to configure which are just variations on code we have already seen.

modelBuilder.Entity<Post>()
    .Property(p => p.Title)
    .IsRequired()
    .HasMaxLength(50);

modelBuilder.Entity<Post>()
    .Property(p => p.Extract)
    .HasMaxLength(500);

modelBuilder.Entity<Post>()
    .Property(p => p.Body)
    .IsRequired();

Next we are configuring the many to many relationship between Post and Tags, here we see the relationship API in action and also the syntax for specifying how the join table of a many to many relationship is controlled.

modelBuilder.Entity<Post>()
    .HasMany(p => p.Tags)
    .WithMany(t => t.Posts)
    .Map("PostTagsTable", (p, t) => new
    {
        post_id = p.PostId,
        tag_id = t.TagText
    });

The one to many relationship between Post and User only exposed a navigation property on Post. We can still specify that Users have many Posts, even though there is no User.Posts property, by using the parameterless overload of HasMany(). We also exposed a foreign key property on the Post class so we need to specify that Post.Author and Post.AuthorId both represent the same relationship, we use the HasConstraint method to do this.

modelBuilder.Entity<Post>()
    .HasRequired<User>(p => p.Author)
    .WithMany()
    .HasConstraint((p, u) => p.AuthorId == u.UserId);

Finally we specify the table and column mappings. Note how the syntax for specifying foreign key columns differs depending on whether a corresponding foreign key property was exposed on the object. If there is a foreign key property in the object mdoel then we can just map that directly to the column (author_id = p.AuthorId), if not then we can dot through the navigation property to the primary key of the related entity (blog_id = p.Blog.BlogId).

modelBuilder.Entity<Post>()
    .MapSingleType(p => new
    {
        post_id = p.PostId,
        title = p.Title,
        extract = p.Extract,
        body = p.Body,
        author_id = p.AuthorId,
        blog_id = p.Blog.BlogId
    })
    .ToTable("PostTable");

Tag

Because Tag doesn’t have an obvious primary key property for Code First to detect by convention we need to configure which property is the key.

modelBuilder.Entity<Tag>()
    .HasKey(t => t.TagText);

Apart from the key the rest is just a simple table and column mapping.

modelBuilder.Entity<Tag>()
    .MapSingleType(t => new
    {
        tag_id = t.TagText
    })
    .ToTable("TagTable");

User & Admin

We map User and Admin at the same time because they form part of the same inheritance hierarchy. Note that we are now using the MapHierarchy method rather than MapSingleType. We then use the Case<T> methods to specify how the two types map to the same table. We specify the values that should go in the discriminator column (user_type) by specifying constants in each mapping fragment. Also note that we don’t need to reconfigure properties as we move up the inheritance hierarchy (i.e. in the Admin block we don’t need to map ‘user_id’ or ‘user_name’ because we already mapped them for the base type).

modelBuilder.Entity<User>()
    .Property(u => u.UserName)
    .IsRequired()
    .HasMaxLength(50);

modelBuilder.Entity<User>()
    .MapHierarchy()
    .Case<User>(u => new
    {
        user_id = u.UserId,
        user_name = u.UserName,
        user_type = "U"
    })
    .Case<Admin>(a => new
    {
        admin_from = a.AdminFrom,
        user_type = "A"
    })
    .ToTable("UserTable");

Access Data

With all the mapping specified we are now ready to start accessing data:

using (var connection = new SqlConnection(@"Server=.\SQLEXPRESS;Database=ExistingBlogDb;Trusted_Connection=True;"))
{
    using (var context = new BlogContext(connection))
    {
        var blog = new Blog { Name = "romiller.com" };
        var me = new Admin { UserName = "Rowan", AdminFrom = DateTime.Today };
        var ctp4 = new Tag { TagText = "CTP4" };

        var post = new Post
        {
            Title = "Mapping to an existing database",
            Extract = "Blah blah blah",
            Body = "Lost of useful info",
            Blog = blog,
            Author = me,
            Tags = new List<Tag> { ctp4 }
        };

        context.Posts.Add(post);
        context.SaveChanges();
    }
}

Configuration Classes

Obviously our OnModelCreating method is getting pretty huge and this is only a small model, this is where derived configuration classes come in handy.

For example I could define the following configuration for Blog:

public class BlogConfig : EntityConfiguration<Blog>
{
    public BlogConfig()
    {
        this.Property(b => b.Name)
            .IsRequired()
            .HasMaxLength(50);

        this.MapSingleType(b => new { blog_id = b.BlogId, name = b.Name })
            .ToTable("BlogTable");
    }
}

And then just register it in the OnModelCreating method:

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.Configurations.Add(new BlogConfig());
    ...
}

Code Generation

If we had a big database with a lot of tables then writing this configuration code is going to become a painful (probably unrealistic) task, in much the same way mapping a huge model to a database from scratch using the designer or xml wouldn’t be a fun task. For this very reason the designer will automatically generate a default 1:1 mapping for you as a starting point. So the logical next step would be to have something analyze your database and give you an object model and mapping code that you could then start to tweak… for a little side project I’m playing around with a generator that does exactly this, it’s not quite ready to share yet but I’ll post it up shortly.

Summary

Mapping to an existing database is achievable with Code First, although the edmx/designer based Database First approach is probably a better fit for the majority of folks. If your database schema varies a lot from the default Code First conventions then you will end up writing quite a bit of code. It is possible to write a generator that will spit out a starting point to reduce the amount of code you need to write by hand, I’ll be sharing the results of a pet project on this shortly.