Roslyn beyond 'Hello world' 06 - One-off code changes with Roslyn

Roslyn beyond 'Hello world' 06 One-off code changes with Roslyn [2022 January 06] .NET, C#, Roslyn, Workspaces, Static-Analysis, Refactoring

This is an article about using Roslyn Workspaces to run quick one-off analysis on your .NET projects without going into complexities of analyzers and code-fixes. It builds upon part 1, where we created a query that analyzed logical expressions inside our code-base. This time we will use Workspaces to replace invocations of an obsolete method with a newer version.

Our synthetic code-base emulates a reporting system for a big corporation. The system pulls together information form accounting, HR, sales and supply systems and prepares analytical reports for our users. It's a big system that has been around for over a decade, so there is quite a bit of legacy code in it. Today we finally decided to get rid of the obsolete overload of our most called method ScheduleReport. We want to replace:

var scheduleId = 
   reportSchedulingSystem.ScheduleReport(reportName, false, id, null, null, null);

with

var scheduleId = 
   reportSchedulingSystem.ScheduleReport(
      reportName, 
      new UserIdsAcrossSystems(userIdInAccountingSystem: id),
      false).id;

in a fully automated manner.

🔗 Query

Sample code is in the same Github project as before. But this time we will be looking at RunCodeChangeExample. Once again we start by locating all places in our code, that we will be working on.


var callsToReplace = await GetContextsOfInterest(allNodes, async (x) =>
{
   var node = x.node as InvocationExpressionSyntax;
   if (node == null) { return null; }

   var methodMemberAccess = node.Expression as MemberAccessExpressionSyntax;
   if (methodMemberAccess == null) { return null; }

   var methodName = methodMemberAccess.Name;
   var methodNameAsText = methodName?.GetText().ToString();
   if (methodNameAsText != "ScheduleReport") { return null; }

   var model = (await compilation.Value).GetSemanticModel(
                  await x.document.GetSyntaxTreeAsync());

   var methodSymbolInfo = model.GetSymbolInfo(methodName).Symbol;

   // Make sure we've got exactly the method we are looking for,
   // and not one of its overloads.
   var displayStringWeAreLookingFor =
      "TestSubject.CodeReplacementTestbed.ReportSchedulingSystem"
      + ".ScheduleReport(string, bool?, int?, int?, int?, int?)";

   if ( methodSymbolInfo?.ToDisplayString() != displayStringWeAreLookingFor
      || false == methodSymbolInfo?.GetAttributes()
                     .Any(x => x.AttributeClass?.Name == "ObsoleteAttribute"))
   {
      return null;
   }

   // Ok, we've got exactly the method we were looking for.
   // Now lets get information about call arguments.
   // Specifically, we extract the EXPRESSIONS that get 
   // value for each argument. Expressions may be literals like '8'
   // and 'null', but they may also be complex method calls of their own,
   // like .ScheduleReport(reportName, GetPriorityFlag(userRole), ....).
   // We only care about the type to which each expression evaluates:
   // whether it is a literal 'true' or an Expression GetPriorityFlag(userRole),
   // it only matters for us that both are nodes which evaluate to a boolean,
   // and, as such, can be interchanged or transplanted to any site in the 
   // SyntaxGraph where a boolean value is expected.
   // We just need to make sure we reorder them properly and drop unnecessary nulls. 
   var parsedArguments =  ParseArguments(
                              node.ArgumentList, 
                              (IMethodSymbol)methodSymbolInfo);

   return new MethodReplacementContext(
      node,
      parsedArguments);

   static Dictionary<string, ExpressionSyntax> ParseArguments(
                                          ArgumentListSyntax argumentsList,
                                          IMethodSymbol methodSymbol)
   {
      var parameters = methodSymbol.Parameters.ToArray();

      var result = argumentsList.Arguments
            .Select((node, index) => (node, index))
            .ToDictionary(
               (x) => x.node.NameColon?.Name.ToString()// named argument
                        ?? parameters[x.index].Name,    // positional argument
               (x) => x.node.Expression);

      return result;
   }
});

The query is similar to what we had in part 1. Now comes the new part - we will actually construct new syntax sub-trees and replace old calls with new ones. This is done via DocumentEditor class.

🔗 Editing graph with DocumentEditor

Solution solution = null; 

// Now we replace all found calls with equivalent calls to new method
foreach (var x in callsToReplace.GroupBy(x => x.document))
{
    solution = solution ?? x.Key.Project.Solution;
    // 'Solution' is the root node of our entire project graph,
    // solution  ==(1..x)==> projects ==(1..x)==> documents.
    // Since entire graph is immutable,
    // any change in it effectively produces a new solution graph,
    // so we must always start by getting the 'current' version
    // of the document we want to edit.
    var currentDocument = solution.GetDocument(x.Key.Id);
    var editor = await DocumentEditor.CreateAsync(currentDocument);

    foreach (var context in x)
    {
        editor.ReplaceNode(
            context.meta.node,
            PrepareNewMethodCall(
                                context.meta.node,
                                context.meta.arguments));
    }
    var newDocument = editor.GetChangedDocument();
    solution = newDocument.Project.Solution;
}

workspace.TryApplyChanges(solution);

🔗 Constructing new Nodes

Ok, how do we create the new node? For that we will use Microsoft.CodeAnalysis.CSharp.SyntaxFactory class, which contains static factory methods for most possible syntax nodes and tokens. We import it via using static to ease usage. Normally, the easiest way to create new code is to make a string of plain text code and then parse it via ParseStatement(ourStr). But when we are transplanting considerable chunks of existing syntax tree, it makes sense to construct nodes via factory methods, since Roslyn will verify that resulting graph makes sense as we go. Here is how we do it:

static SyntaxNode PrepareNewMethodCall(
   InvocationExpressionSyntax oldCall, 
   Dictionary<string, ExpressionSyntax> oldArgumentExpressions)
{
   // When we prepare a new SyntaxTree, we start with the 'leaves',
   // nodes which have no descendants or very few of them.
   // This is because, ideally, we don't want to construct
   // graphs of more than 3 nodes at a time.
   // When we have the "leaves" - we compose them into branches,
   // then branches into more branches and so on recursively,
   // each time dealing with composition of just a few nodes.
   // There is nothing preventing us from constructing entire sub-tree
   // in one go, but that approach is usually harder to read and reason about
   // (you'll see why below).

   // First we prepare new arguments one by one
   var newReportNameArgument = Argument(oldArgumentExpressions["reportName"]);

   // Luckily for us, names of user ids don't change between old and new method,
   // else we would have to map them.
   var userIdParamNames = new string[]
   {
       "userIdInAccountingSystem",
       "userIdInHrSystemSystem",
       "userIdInSalesSystem",
       "userIdInSupplySystem"
   };

   var userIdExpressions = 
      oldArgumentExpressions
         .Where(x => userIdParamNames.Contains(x.Key)
                     && x.Value.Kind() != SyntaxKind.NullLiteralExpression)
         .ToArray();

   SyntaxNode newUserIdsArgument = null;
   if (userIdExpressions.Any()) 
   {
       // array of 'userIdInSystemX: expression'
       var userIdsAsNamedArguments = userIdExpressions
           .Select(x =>
                   Argument(x.Value)
                       .WithNameColon(
                           NameColon(IdentifierName(x.Key))));

       // new UserIdsAcrossSystems(
       //      useIdInSystemX: expression,
       //      useIdInSystemY: expression)
       newUserIdsArgument = 
         Argument(
            ObjectCreationExpression(
            IdentifierName("UserIdsAcrossSystems"))
               .WithArgumentList(
                  ArgumentList(
                  SeparatedList(userIdsAsNamedArguments))));
   }

   ArgumentSyntax newScheduleWithPriorityArgument = null;
   if (oldArgumentExpressions.TryGetValue(
            "priority", 
            out var scheduleWithPriorityExpression)
       && scheduleWithPriorityExpression.Kind() != SyntaxKind.NullLiteralExpression)
   {
       newScheduleWithPriorityArgument = Argument(scheduleWithPriorityExpression);
   }

   // Now we put new arguments in a list, if they are needed (not null).
   // To keep things beautiful, we switch to named arguments
   // once any positional argument is skipped.

   var newArgumentNodesList = new List<SyntaxNodeOrToken>();
   newArgumentNodesList.Add(newReportNameArgument);

   var switchedToNamedArgumentMode = false;
   if(newUserIdsArgument != null)
   {
       newArgumentNodesList.Add(Token(SyntaxKind.CommaToken));
       newArgumentNodesList.Add(newUserIdsArgument);
   } 
   else
   {
      switchedToNamedArgumentMode = true;
   }

   if(newScheduleWithPriorityArgument != null)
   {
       if(switchedToNamedArgumentMode)
       {
           newScheduleWithPriorityArgument = 
               newScheduleWithPriorityArgument
                   .WithNameColon(
                       NameColon(
                           IdentifierName("scheduleWithPriority")));
       }
       newArgumentNodesList.Add(Token(SyntaxKind.CommaToken));
       newArgumentNodesList.Add(newScheduleWithPriorityArgument);
   }

   var newCall = InvocationExpression(
                     MemberAccessExpression(
                       SyntaxKind.SimpleMemberAccessExpression,
                       (oldCall.Expression as MemberAccessExpressionSyntax).Expression,
                       IdentifierName("ScheduleReport")))
                   .WithArgumentList(
                        ArgumentList(
                           SeparatedList<ArgumentSyntax>(newArgumentNodesList)));

   // Since our new method returns an object instead of just id,
   // we must now add id extraction after call.
   var getIdFromNewCallResult = MemberAccessExpression(
                       SyntaxKind.SimpleMemberAccessExpression,
                       newCall,
                       IdentifierName("id"));

   // We must remember to call "NormalizeWhitespace",
   // without it there will be no spaces at all in code,
   // and it will not compile.
   getIdFromNewCallResult = getIdFromNewCallResult
                               .NormalizeWhitespace(elasticTrivia: true);

   return getIdFromNewCallResult;
}

Syntax graph construction looks a bit overwhelming, considering that all this code is for just a single invocation. Don't worry though, we don't have to remember how to construct it all by hand. Instead, there is a great tool by Kirill Osenkov, Roslyn Quoter. We can drop a snippet of C# code that we would like to build, parse it as "Expression" and use resulting api calls as basis:

This functionality is also built into LINQPad.

As mentioned in code, the best way is to start with leaf nodes and go recursively from them. Once again we see the benefits of Roslyn approach compared to Regexs. When we are transplanting any node, we don't care how many lines of code are inside it. All we care about is its root type, as long as that fits into new location - Roslyn will be able to construct proper syntax tree incorporating it. Regexs will never be able to comfortably work with such level of complexity.

Now we can run our project. It will find and replace 4 legacy calls in our code-base. We can use Git to check difference (formatting is condensed to fit the code).


- var scheduleId = reportSchedulingSystem.ScheduleReport(reportName, 
         false, id, null, null, null);
+ var scheduleId = reportSchedulingSystem.ScheduleReport(reportName, 
         new UserIdsAcrossSystems(userIdInAccountingSystem: id), false).id;

- var scheduleId = reportSchedulingSystem.ScheduleReport(reportName: reportName, 
         null, null, null, null, null);
+ var scheduleId = reportSchedulingSystem.ScheduleReport(reportName).id;

- var scheduleId = reportSchedulingSystem.ScheduleReport("SH10: Pending shipping orders", 
         null, userIdInSupplySystem: 8);
+ var scheduleId = reportSchedulingSystem.ScheduleReport("SH10: Pending shipping orders", 
         new UserIdsAcrossSystems(userIdInSupplySystem: 8)).id;

- var scheduleId = reportSchedulingSystem.ScheduleReport(reportName,
-       GetPriorityFlagFromConfigBasedOnUsersPositionInSales(usesPositionInSales), 
         null, null, salesId, suplyId);
+ var scheduleId = reportSchedulingSystem.ScheduleReport(reportName, 
   ew UserIdsAcrossSystems(userIdInSalesSystem: salesId, userIdInSupplySystem: suplyId), 
   GetPriorityFlagFromConfigBasedOnUsersPositionInSales(usesPositionInSales)).id;

Voila. Now all that is left is to make sure that code still compiles and run regression test.

You have now seen, just how easy it is to work with Syntax and Symbol graphs with Roslyn Workspaces. Welcome to the next level of .NET mastery and good luck in your endeavors :-).

Compiling enterprise

🔗 Query

🔗 Editing graph with DocumentEditor

🔗 Constructing new Nodes

Archives

Elsewhere