EF6.1–Workaround Trailing Blanks Issue in String Joins

A semi-common issue that folks hit with Entity Framework is that the in-memory semantics for “relationship fixup” are different than how SQL Server performs joins with regard to trailing spaces in strings. Relationship fixup is the process where EF looks at primary and foreign key values and populates navigation properties when they match.

 

The Problem

The issue is that SQL Server ignores trailing spaces when comparing strings but the .NET string functions (which EF uses) does not. On SQL Server “ABCD” and “ABCD    “ are considered equal, but in .NET (and EF) they are not.

The implications of this are best illustrated with an example. Consider the following code that uses a Group and User model with a string based foreign key relationship between them. The code creates a database and populates it with one Group and a couple of Users. The primary key of the Group has trailing spaces but the foreign key values in the User instances do not.

using System;
using System.Collections.Generic;
using System.ComponentModel.DataAnnotations;
using System.Data.Entity;
using System.Linq;

namespace FixedLengthDemo
{
    class Program
    {
        static void Main(string[] args)
        {
            using (var db = new UserContext())
            {
                if (!db.Database.Exists())
                {
                    db.Database.Create();

                    db.Groups.Add(new Group { GroupId = "CoolKids               ", Name = "The cool kids" });
                    db.Users.Add(new User { UserId = "JohnC", Name = "John Citizen", GroupId = "CoolKids" });
                    db.Users.Add(new User { UserId = "JaneC", Name = "Jane Citizen", GroupId = "CoolKids" });
                    db.SaveChanges();
                }
            }

            using (var db = new UserContext())
            {
                var groups = db.Groups.Include(g => g.Users).ToList();

                Console.WriteLine("Groups in memory: " + db.Groups.Local.Count);
                Console.WriteLine("Users in memory: " + db.Users.Local.Count);
                Console.WriteLine("Users in navigation property: " + groups[0].Users.Count);
            }
        }
    }

    public class UserContext : DbContext
    {
        public DbSet<User> Users { get; set; }
        public DbSet<Group> Groups { get; set; }
    }

    public class User
    {
        public string UserId { get; set; }
        public string Name { get; set; }

        public string GroupId { get; set; }
        public Group Group { get; set; }
    }

    public class Group
    {
        public Group()
        {
            this.Users = new List<User>();
        }

        public string GroupId { get; set; }
        public string Name { get; set; }

        public List<User> Users { get; set; }
    }
}

 

When we run the console application we see that all Groups and Users are retrieved from the database (since SQL Server will ignore the trailing spaces and successfully join data from the two tables). However, the Users navigation property on Group is not populated because relationship fixup will not match the values due to trailing spaces.

Groups in memory: 1
Users in memory: 2
Users in navigation property: 0

 

Intrusive Fixes

There are of course some intrusive fixes you could make, and these are things we have recommended in the past:

  • Use a view (or other intermediary database structure) to ‘fix’ the data by removing trailing blanks.
  • Fix the data in the database by removing trailing blanks.
  • Compensate in application code for this limitation of EF.

 

A Better Fix

In EF6.1 we can use query interceptors and publically constructible query trees to resolve this issue without having to compensate in our database or product code.

The Interceptor

Here is the code for an interceptor that detects any string columns being accessed and applies a trim function to have white space removed in the query.

using System.Data.Entity.Core.Common.CommandTrees;
using System.Data.Entity.Core.Common.CommandTrees.ExpressionBuilder;
using System.Data.Entity.Core.Metadata.Edm;
using System.Data.Entity.Infrastructure.Interception;
using System.Linq;

namespace FixedLengthDemo
{
    public class StringTrimmerInterceptor : IDbCommandTreeInterceptor
    {
        public void TreeCreated(DbCommandTreeInterceptionContext interceptionContext)
        {
            if (interceptionContext.OriginalResult.DataSpace == DataSpace.SSpace)
            {
                var queryCommand = interceptionContext.Result as DbQueryCommandTree;
                if (queryCommand != null)
                {
                    var newQuery = queryCommand.Query.Accept(new StringTrimmerQueryVisitor());
                    interceptionContext.Result = new DbQueryCommandTree(
                        queryCommand.MetadataWorkspace,
                        queryCommand.DataSpace,
                        newQuery);
                }
            }
        }

        private class StringTrimmerQueryVisitor : DefaultExpressionVisitor
        {
            private static readonly string[] _typesToTrim = { "nvarchar", "varchar", "char", "nchar" };

            public override DbExpression Visit(DbNewInstanceExpression expression)
            {
                var arguments = expression.Arguments.Select(a =>
                {
                    var propertyArg = a as DbPropertyExpression;
                    if (propertyArg != null && _typesToTrim.Contains(propertyArg.Property.TypeUsage.EdmType.Name))
                    {
                        return EdmFunctions.Trim(a);
                    }

                    return a;
                });

                return DbExpressionBuilder.New(expression.ResultType, arguments);
            }
        }
    }
}

Registering the Interceptor

Now that we have an interceptor, we need to tell EF to use it. This is best done via Code-Based Configuration. We can just drop the following class in the same assembly/project as our context and EF will pick it up.

using System.Data.Entity;

namespace FixedLengthDemo
{
    public class MyConfiguration : DbConfiguration
    {
        public MyConfiguration()
        {
            AddInterceptor(new StringTrimmerInterceptor());
        }
    }
}

 

The Result

When we run the app we’ll see that relationship fixup now occurs and the navigation properties are populated.

Groups in memory: 1
Users in memory: 2
Users in navigation property: 2

 

If we inspect the SQL that is generated we see the trimming is done in the database. Here is the query fragment that selects a property:

LTRIM(RTRIM([Extent1].[GroupId])) AS [GroupId]

 

We’ll also see that the JOIN operators still work on the untrimmed strings, since SQL will already compensate for the trailing spaces. Here is the query fragment for the join:

LEFT OUTER JOIN [dbo].[Users] AS [Extent2] ON [Extent1].[GroupId] = [Extent2].[GroupId]