Expression trees and advanced queries in C# 02 - IQueryable composition

Expression trees and advanced queries in C# 02 IQueryable composition [2017 May 14] .NET, C#, SQL, IQueryable, Composition

Part 1 - Expression Tree basics
Sample project containing a demonstration of the theory explained in this article is available on GitHub
Database used in examples is for Adventure Works - a hypothetical bike production company

In the previous part we have determined that:

IQueryably consists of a Provider and an Expression Tree
Expression Trees can be combined almost as easily as pieces of C# code

And as a result, IQueryables are easily composable.

In this article, we will look at treating our queries are reusable chunks of logic and combining them into more complex yet still readable queries like this:

public IQueryable<ProductModelOrderStatisticsDto> GetProductModelOrderStats()
{
    // a bigger, more detailed query
    IQueryable<WorkOrderSummaryDto> allDurationsAndRoutings =
                                                    GetWorkOrderSummaries();

    // is wrapped by an aggregation to retrieve statistics
    var averagePerModel = allDurationsAndRoutings
                .GroupBy(x => new { x.ProductModelId, x.ModelName })
                .Select(x => new ProductModelOrderStatisticsDto
                {
                    ModelId = x.Key.ProductModelId,
                    ModelName = x.Key.ModelName,
                    AverageDuration = x.Where(y => y.DurationDays.HasValue)
                                        .Average(y => y.DurationDays.Value),
                    AverageRoutings = x.Average(y => y.RoutingsCount)
                });

    return averagePerModel;
}

Let’s look at an average complexity query that is used to get a detailed view of orders for a hypothetical enterprise.

public IQueryable<WorkOrderSummaryDto> GetWorkOrderSummaries()
{
IQueryable<WorkOrderSummaryDto> allDurationsAndRoutings = 
        from model in DataContext.ProductModels
        join product in DataContext.Products
            on model.ProductModelId equals product.ProductModelId
        join workOrder in DataContext.WorkOrders
            on product.ProductId equals workOrder.ProductId
        join workOrderRouting in DataContext.WorkOrderRoutings
            on workOrder.WorkOrderId equals workOrderRouting.WorkOrderId
            into routingsPerOrder

        let duration = 
              DbFunctions.DiffDays(workOrder.StartDate, workOrder.EndDate)

        orderby duration descending

        select new WorkOrderSummaryDto
        {
            ProductModelId = model.ProductModelId,
            ModelName = model.Name,
            ProductId = product.ProductId,
            ProductName = product.Name,
            OrderId = workOrder.WorkOrderId,
            DurationDays = duration,
            RoutingsCount = routingsPerOrder.Count(),
            Locations = DataContext
                            .Locations
                            .Where(y => routingsPerOrder
                                            .Select(z => z.LocationId)
                                            .Contains(y.LocationId))
                            .Select(z => z.Name)
                            .Distinct()
        };

return allDurationsAndRoutings;
}

Notice that this query uses a combination of Query Syntax and Method Syntax of LINQ (More on those). To dispel a common misconception, I must mention, there is no difference in the results you get from using this two forms .

Both forms will produce either an IEnumerable<T> or an IQueryable<T> based on which sources of data are accessed in the process and what transformations are applied to them. In our case, data sources are Entity Framework 6 context data sets, which implement IQueryable<T>, and the result is an IQueryable accordingly.

From my experience, Query Syntax is a lot easier to work with when querying relational databases due to the way it handles joins and visibility scopes. I.E. you can declare a join or let statement once at the top of the query and use it once at the very bottom, whereas with Method Syntax anything you declare is limited to the scope of the declaring method and will have to be passed to the next as part of some intermediate context. But otherwise, up to my knowledge, anything you can express in Query Syntax you can also express in Method Syntax. The true beauty comes out when you combine the two, as shown in the example.

The binding of Locations showcases the first kind of IQueryable composition, as you can see that the code producing it looks like a subquery. As we will see soon, you might have just as easily first produced a standalone IQueryable, and then plugged it in here, but an inline query suits our needs more, because it uses routingsPerOrder join result internally.

Considering the amount of different logics and different approaches utilized in this query, it is quite impressive that our IQueryable Provider (EF6), is able to combine all this C# code (compiler generated expressions of it to be precise) and translate it into a single SQL query (don't read into it too hard):

SELECT 
[Project6].[C2] AS [C1], 
[Project6].[ProductModelID] AS [ProductModelID], 
[Project6].[Name] AS [Name], 
[Project6].[ProductID] AS [ProductID], 
[Project6].[Name1] AS [Name1], 
[Project6].[WorkOrderID] AS [WorkOrderID], 
[Project6].[C1] AS [C2], 
[Project6].[C4] AS [C3], 
[Project6].[C3] AS [C4], 
[Project6].[Name2] AS [Name2]
FROM ( SELECT 
    [Limit1].[ProductModelID] AS [ProductModelID], 
    [Limit1].[Name] AS [Name], 
    [Limit1].[ProductID] AS [ProductID], 
    [Limit1].[Name1] AS [Name1], 
    [Limit1].[WorkOrderID] AS [WorkOrderID], 
    [Limit1].[C1] AS [C1], 
    [Limit1].[C2] AS [C2], 
    [Distinct1].[Name] AS [Name2], 
    CASE WHEN ([Distinct1].[Name] IS NULL) THEN CAST(NULL AS int) 
        ELSE 1 END AS [C3], 
    [Limit1].[C3] AS [C4]
    FROM   (SELECT TOP (1000) 
        [Project3].[ProductModelID] AS [ProductModelID], 
        [Project3].[Name] AS [Name], 
        [Project3].[ProductID] AS [ProductID], 
        [Project3].[Name1] AS [Name1], 
        [Project3].[WorkOrderID] AS [WorkOrderID], 
        [Project3].[C1] AS [C1], 
        [Project3].[C2] AS [C2], 
        [Project3].[C3] AS [C3]
        FROM ( SELECT 
            [Project2].[ProductModelID] AS [ProductModelID], 
            [Project2].[Name] AS [Name], 
            [Project2].[ProductID] AS [ProductID], 
            [Project2].[Name1] AS [Name1], 
            [Project2].[WorkOrderID] AS [WorkOrderID], 
            [Project2].[C1] AS [C1], 
            1 AS [C2], 
            [Project2].[C2] AS [C3]
            FROM ( SELECT 
              [Project1].[ProductModelID] AS [ProductModelID], 
              [Project1].[Name] AS [Name], 
              [Project1].[ProductID] AS [ProductID], 
              [Project1].[Name1] AS [Name1], 
              [Project1].[WorkOrderID] AS [WorkOrderID], 
              [Project1].[C1] AS [C1], 
              (SELECT 
                  COUNT(1) AS [A1]
                  FROM [Production].[WorkOrderRouting] AS [Extent4]
                  WHERE [Project1].[WorkOrderID] = [Extent4].[WorkOrderID]) 
                                                                    AS [C2]
              FROM ( SELECT 
                  [Extent1].[ProductModelID] AS [ProductModelID], 
                  [Extent1].[Name] AS [Name], 
                  [Extent2].[ProductID] AS [ProductID], 
                  [Extent2].[Name] AS [Name1], 
                  [Extent3].[WorkOrderID] AS [WorkOrderID], 
                  DATEDIFF (day, [Extent3].[StartDate], [Extent3].[EndDate]) 
                                                                    AS [C1]
                  FROM   [Production].[ProductModel] AS [Extent1]
                  INNER JOIN [Production].[Product] AS [Extent2] 
                      ON [Extent1].[ProductModelID] = [Extent2].[ProductModelID]
                  INNER JOIN [Production].[WorkOrder] AS [Extent3] 
                      ON [Extent2].[ProductID] = [Extent3].[ProductID]
              )  AS [Project1]
            )  AS [Project2]
        )  AS [Project3]
        ORDER BY [Project3].[C1] DESC ) AS [Limit1]
    OUTER APPLY  (SELECT DISTINCT 
        [Extent5].[Name] AS [Name]
        FROM [Production].[Location] AS [Extent5]
        WHERE  EXISTS (SELECT 
            1 AS [C1]
            FROM [Production].[WorkOrderRouting] AS [Extent6]
            WHERE ([Limit1].[WorkOrderID] = [Extent6].[WorkOrderID]) 
                    AND ([Extent6].[LocationID] = [Extent5].[LocationID])
        ) ) AS [Distinct1]
)  AS [Project6]
ORDER BY [Project6].[C1] DESC, 
            [Project6].[ProductModelID] ASC, 
            [Project6].[ProductID] ASC, 
            [Project6].[WorkOrderID] ASC, 
            [Project6].[C3] ASC

This may look slightly overwhelming at first. Luckily, you rarely need to peek inside the generated SQL code, and when you do - just follow the column names to match generated pieces to your C# code. Also notice, how good of a job our chosen provider does, utilizing features specific to the MS SQL Server currently being used, like OUTER APPLY. Also, in case you are worried about productivity, SQL databases are very smart about optimizing query plans, so this SQL query, despite looking complex compared to a hand-written one, will execute with negligible performance differences in most cases.

We've seen inline composition of IQueryables with Locations, but what if we wanted to have a query piece reusable by several other queries or even a standalone query that may also be used in composition? Let’s return to the example in preview.

public IQueryable<ProductModelOrderStatisticsDto> GetProductModelOrderStats()
{
    // a bigger, more detailed query
    IQueryable<WorkOrderSummaryDto> allDurationsAndRoutings =
                                                    GetWorkOrderSummaries();

    // is wrapped by an aggregation to retrieve statistics
    IQueryable<ProductModelOrderStatisticsDto> averagePerModel = 
            allDurationsAndRoutings
                .GroupBy(x => new { x.ProductModelId, x.ModelName })
                .Select(x => new ProductModelOrderStatisticsDto
                {
                    ModelId = x.Key.ProductModelId,
                    ModelName = x.Key.ModelName,
                    AverageDuration = x.Where(y => y.DurationDays.HasValue)
                                        .Average(y => y.DurationDays.Value),
                    AverageRoutings = x.Average(y => y.RoutingsCount)
                });

    return averagePerModel;
}

This query represents a typical situation, when, for example, you have a data grid and a statistics section associated with it. Maintaining the two queries separately puts a burden on the developer, since, if he would change the filter conditions for the grid (let's say, to exclude obsolete models), the statistics query would also have to be updated. Unless it incorporated the grid query, as shown above. You would think this leads to an inefficient SQL query, since much of the data used for the grid does not influence statistics, i.e. Locations property is not used at all. Luckily, better IQueryable Providers out there don't just translate expression trees into equal queries, but also analyze and optimize them, so the generated SQL for combined query is shorter than SQL for the base query.

SELECT 
[GroupBy2].[K1] AS [ProductModelID], 
[GroupBy2].[K2] AS [Name], 
(SELECT 
    AVG([Filter2].[A1]) AS [A1]
    FROM ( SELECT 
            CAST( DATEDIFF (day, 
                            [Extent7].[StartDate], 
                            [Extent7].[EndDate]) AS float) AS [A1]
        FROM   [Production].[ProductModel] AS [Extent5]
        INNER JOIN [Production].[Product] AS [Extent6] 
            ON [Extent5].[ProductModelID] = [Extent6].[ProductModelID]
        INNER JOIN [Production].[WorkOrder] AS [Extent7] 
            ON [Extent6].[ProductID] = [Extent7].[ProductID]
        WHERE ([GroupBy2].[K1] = [Extent5].[ProductModelID]) 
                AND ([GroupBy2].[K2] = [Extent5].[Name]) 
                AND (DATEDIFF (day, 
                            [Extent7].[StartDate], 
                            [Extent7].[EndDate]) IS NOT NULL)
    )  AS [Filter2]) AS [C1], 
[GroupBy2].[A1] AS [C2]
FROM ( SELECT 
    [Join2].[K1] AS [K1], 
    [Join2].[K2] AS [K2], 
    AVG([Join2].[A1]) AS [A1]
    FROM ( SELECT 
        [Extent1].[ProductModelID] AS [K1], 
        [Extent1].[Name] AS [K2], 
            CAST( (SELECT 
            COUNT(1) AS [A1]
            FROM [Production].[WorkOrderRouting] AS [Extent4]
            WHERE [Extent3].[WorkOrderID] = [Extent4].[WorkOrderID]) 
                                                    AS float) AS [A1]
        FROM   [Production].[ProductModel] AS [Extent1]
        INNER JOIN [Production].[Product] AS [Extent2] 
            ON [Extent1].[ProductModelID] = [Extent2].[ProductModelID]
        INNER JOIN [Production].[WorkOrder] AS [Extent3] 
            ON [Extent2].[ProductID] = [Extent3].[ProductID]
    )  AS [Join2]
    GROUP BY [K1], [K2]
)  AS [GroupBy2]

Up to my knowledge, you can compose IQueryables that use the same provider in any way that can be expressed in C# and the amount of composition is only constrained by computation resources at your disposal. Here is an example of what I mean.

public IQueryable<FreshProductWithBadReviewReportRow> 
                                            GetFreshProductsWithBadReviews()
{
    IQueryable<Product> freshProducts =  GetFreshProducts();

    IQueryable<ProductModel> freshModels = 
                                GetFreshModels(DateTime.Today.AddYears(-5));

    IQueryable<ProductIdAverageReviewScore> averageReviewScores = 
                                            GetProductAverageReviewScores();

    IQueryable<ProductReview> badReviews = GetBadReviews();

    IQueryable<ProductSubcategory> fullBikeSubcategories = 
                                                 GetFullBikeSubcategories();

    var query = from product in freshProducts
                join model in freshModels
                    on product.ProductModelId equals model.ProductModelId

                let averageReviewScore = 
                            averageReviewScores
                                    .Where(x => x.ProductId 
                                                        == product.ProductId)
                                    .Select(x => x.AverageReviewScore)
                                    .FirstOrDefault()

                let isAverageReviewScorePositive = averageReviewScore > 2.5

                where badReviews.Select(r => r.ProductId)
                                .Contains(product.ProductId)

                select new FreshProductWithBadReviewReportRow
                {
                    ProductId = product.ProductId,
                    ProductName = product.Name,
                    ProductModel = model.CatalogDescription,
                    IsFullyAssembledBike = fullBikeSubcategories
                                          .Any(sc => sc.ProductSubcategoryId 
                                                == product.ProductSubcategoryId),
                    AverageReviewScore = averageReviewScore,
                    IsAverageReviewScorePositive = isAverageReviewScorePositive

                };

    return query;
}

All this C# code still translates into a single SQL query and can itself be used as part of some even bigger query.

Looking at the size of the queries used and amount of logics in them, some of you may be asking the, IMHO, most important question in today’s development - "How do we unit test that?".

A cool aspect of using IQueryable is the fact that that you can take its Expression Tree part and use it with a different Provider, including the most flexible provider of all – in-memory POCOs. Just make sure, as with most code, that you are working with your data source via an abstracted interface. Entity Framework, for example, has multiple implementations of FakeDbSet available for that purpose.

*For good implementation of FakeDbContext see Visual Studio extension EntityFramework Reverse POCO Generator. It is the one used to generate EF context in the sample project and also in multiple production projects I architected at work - it is production proven.

//Every repository inherits base class
//and gets access to EF context via its property
public abstract class EFRepositoryBase
{
    public EFRepositoryBase(IAdventureWorksDataContext context)
    {
        DataContext = context;
    }

    protected IAdventureWorksDataContext DataContext
    {
        get;
        private set;
    }
}
    
//data context normally injected into your repos
public partial interface IAdventureWorksDataContext : System.IDisposable
{
    System.Data.Entity.DbSet<Address> Addresses { get; set; } 
    System.Data.Entity.DbSet<AddressType> AddressTypes { get; set; }
    //...
}

//data context used in unit testing
public partial class FakeAdventureWorksDataContext : IAdventureWorksDataContext
{
    public System.Data.Entity.DbSet<Address> Addresses { get; set; }
    public System.Data.Entity.DbSet<AddressType> AddressTypes { get; set; }
    ...
    public FakeAdventureWorksDataContext()
    {
        Addresses = new FakeDbSet<Address>("AddressId");
        AddressTypes = new FakeDbSet<AddressType>("AddressTypeId");
    }
}

For the unit test, you create an instance of a fake DB context, and populate its sets with in-memory data, run the query and assert the results.

protected FakeAdventureWorksDataContext Context()
{
    return new FakeAdventureWorksDataContext();
}

protected FakeDbSet<T> Set<T>(params T[] items) 
    where T : class
{
    var set = new FakeDbSet<T>();
    set.AddRange(items);
    return set;
}

[TestMethod]
public void GetBusinessEntityes_Test()
{
    var context = Context();
    context.BusinessEntities = Set(new BusinessEntity
    {
        BusinessEntityId = 2
    });

    context.People = Set(new Person()
    {
        BusinessEntityId = 2,
        FirstName = "FIRSTNAME",
        LastName = "LASTNAME"
    });

    context.BusinessEntityAddresses = Set(new BusinessEntityAddress()
    {
        BusinessEntityId = 2,
        AddressId = 3
    });

    context.Addresses = Set(new Address()
    {
        AddressId = 3,
        City = "CITY",
        StateProvinceId = 4,
    });

    context.StateProvinces = Set(new StateProvince()
    {
        StateProvinceId = 4,
        Name = "PROVINCE"
    });
                    
   var repo = new BusinessEntityRepository(context);

   var result = repo.GetBusinessEntities().ToArray();

    Assert.IsTrue(result.Count() == 1);
    var ent = result[0];
    Assert.IsTrue(ent.BusinessEntityId == 2);
    Assert.IsTrue(ent.ContactName == "FIRSTNAME LASTNAME");
    Assert.IsTrue(ent.Addresses.Count() == 1);
    var adr = ent.Addresses.ElementAt(0);
    Assert.IsTrue(adr == "The glorious city of CITY of the wonderful province of PROVINCE");
}

*With Entity Framework in particular, there is a limitation that has to be observed in order to keep your queries unit testable - you either have to avoid use of navigation properties in favor of explicit joins, or populate navigation property collections of items in your fake data sets. For now, we stick to the approach of explicit joins. When we will be switching to EF Core (7), there is a backlog task to make a mini lib which would autopopulate navigation collections, drawing relation information from model configurations of EF.

This covers the basic scenario of unit testing, but sometimes provider differences do show up. An example would be the use of 'DbFunctions.DiffDays(workOrder.StartDate, workOrder.EndDate)' line we've seen above. Since it represent a call to your DB built-in function, the C# representation only serves as a maker for provider. If you try to actually call it - you will get a NotImplemented exception. Since the function is static - that would normally be a problem in unit testing, however, remember from part 1, you can analyze the tree that makes up your query logics and modify it. In Part 3 - Expression Tree Modification we will see how to locate and shim such places in your query during tests.

Compiling enterprise

Archives

Elsewhere