Mastering LINQ for Efficient Data Manipulation in C# and Data Mining in ESAPI
- mschmidt33
- Sep 8, 2024
- 8 min read
One of my favorite APIs available in the .NET Framework is also readily available from Microsoft and included in many C# .NET Framework projects by default. The LINQ library, standing for Language-INtegrated Query allows for collections of objects to be searched with inline code, rather than writing tedious loops. It has an intuitive language structure that mimics the query syntax of SQL database querying. The popularity of this library amongst C# developers means there are plenty of resources for learning how to use LINQ. Through some examples, let's see how LINQ can help organize, filter, and select proper data in C# code and ESAPI data mining examples.
A Brief Introduction to LINQ
Prior to taking a dive into some LINQ queries, it is beneficial to understand some of the individual components that make up a LINQ Query. First, LINQ queries apply to collections. There are many types of collections in C#.
Arrays: Used to store multiple values of a single object type. If the array values are not defined inline, the length of the array must be included in its creation.
int[] numbers = new int[] {1, 2, 3, 4, 5, 6, 7, 8, 9};
Lists: Used to store multiple values of a single object type. Lists possess more features than arrays including methods to Add, Remove, and Clear the list. Both Lists and Arrays are also indexable (i.e. numbers[2] returns the 3rd value in the collection).
List<int> numbers = new List<int>() {1, 2, 3, 4, 5, 6, 7, 8};
numbers.Add(9);
IEnumerable: A collection used for supporting iterations. IEnumerable collections do not support the same methods for modifications as Lists, and also are not indexable.
ObservableCollection: A type of collection similar to a List, but an ObersvableCollection inherits INotifyPropertyChanged when the methods, Add, Remove, or Clear are called.
There are other types of collections as well including Dictionaries, Queues, Stacks, and more, but for this example the types in the above list will suffice. Once any type of collection is instantiated, the IntelliSense will suggest many methods with the standard purple box, but with a gray arrow superimposed.
These methods are called Extension Methods. Extension methods allow for APIs to implement methods on specific types, without the type needing to implement the method itself. Once the LINQ query is called on the collection, the form of the query will take the following:
Collection.LinqQuery(parameter => parameter[.Property] [Comparison])
The line above uses LINQ in method-based syntax as opposed to query syntax. Here, parameter is a variable name which represents a single item in the collection. The parameter will follow the same naming rules as any other variable in C# (i.e. starts with a letter, does not contain spaces, cannot be used for variable names in any parent scope, etc.) but otherwise can be any variable name. Next, the symbol "=>" is called a lambda expression. A lambda expression in LINQ is an anonymous function that is used to execute the function within the LINQ expression. This allows for a concise way for a method to be passed as an argument or returned as a value in LINQ queries. Within the LINQ expression, if the parameter object is a class object with properties, those properties can be invoked incline, and if the function within the LINQ expression compares the objects relatively or to a value specified, comparison operations can be used.
A lambda expression in LINQ is an anonymous function that is used to execute the function within the LINQ expression. This allows for a concise way for a method to be passed as an argument or returned as a value in LINQ queries.
To understand a bit more about what's happening in the LINQ statement, take a look at the example of the WHERE statement on the numbers variable provided above.
numbers is an array variable. The LINQ expression Where intends to return an IEnumerable of the same type (int). In the Where statement definition a Func Delegate is defined, following the pattern Func<T, TResult> where the function takes a parameter of type T and returns an object of type TResult. The Where statement is then using a variable int (for each parameter in the array) and returning a value of bool which is to state true of false based on a comparison of the developers choosing. Predicate in this example is a term used for a function that returns a boolean value. To further see the syntax in action, the completed query can be used.
In the example above, the over variable will store the results from the LINQ expression that return the values greater than 5 from the numbers variable. Using the syntax described in the above paragraph, the predicate of the Where operator will return true whenever a number in the numbers array is greater than 5. Finally, write the resultant value in a Console separating them by a comma separator in the String.Join method.
Simple LINQ Examples
Working with Number Arrays
Building on the simple numbers example, more LINQ queries can be built from the same collection. After looking at all numbers in the array greater than 5, let's explore how to use the modulo operator (%) to find all even numbers in the numbers collection. Please see the result below.
Console.WriteLine("Even Numbers:");
var even = numbers.Where(n => n % 2 == 0); Console.WriteLine(String.Join(",",even));
The Where LINQ expression is appropriately popular, but there are others that can help as well. Please see the examples below where the code first uses the ElementAt LINQ expression that retrieves the value of a collection at a specified index. Please note, that even though collections such as IEnumerable collections are not indexable, ElementAt is still able to extract values by indexes. Next, a specified count of objects in the collection can be skipped using the Skip method, and an additional parameter can choose how many values to take using the Take method.
Console.WriteLine("4th value in numbers:");
var fourth = numbers.ElementAt(3);
Console.WriteLine(fourth.ToString());
Console.WriteLine("All Numbers after the 4th number:");
var after_fourth = numbers.Skip(4);
Console.WriteLine(String.Join(",",after_fourth));
Console.WriteLine("Only 2 numbers after 4th:");
var two_after_fourth = numbers.Skip(4).Take(2);
Console.WriteLine(String.Join(",",two_after_fourth));
LINQ also has some mathematical uses. Below see some examples of LINQ expressions that perform mathematical operations on the numbers array.
Console.WriteLine("Math with LINQ");
Console.WriteLine($"Max = {numbers.Max()}");
Console.WriteLine($"Min = {numbers.Min()}");
Console.WriteLine($"Sum = {numbers.Sum()}");
Console.WriteLine($"Average = {numbers.Average()}");
Working with String Arrays
Performing LINQ operations with string arrays will allow for some unique examples as String objects in C# have a number of properties that can be leveraged in C#. Here, start with a string array of cities where Gateway Scripts has hosted in-person courses. First of all, let's order the cities in alphabetical order. In order to order string arrays alphabetically, only the parameter itself is passed into the OrderBy function. Collections of numerical values will order naturally on their numerical value. Collections that contain some custom class objects will likely require a property to be specified in the parameter after the lambda expression to be ordered properly.
string[] cities = new string[] { "Denver", "Avignon", "Beijing", "Sydney", "Las Vegas", "Tampa" };
Console.WriteLine("Ordered Cities: ");
var ordered_cities = cities.OrderBy(c => c);
Console.WriteLine(String.Join(",", ordered_cities));
In order to order string arrays alphabetically, only the parameter itself is passed into the OrderBy function. Collections of numerical values will order naturally on their numerical value.
Through a couple more examples, the cities containing the letter 'n' and the first city that starts with the letter 'A'. These two examples are using String methods inside the LINQ expression to filter or single out a specific value.
Console.WriteLine("Cities that contain the letter 'n'");
var cities_with_n = cities.Where(c => c.Contains("n"));
Console.WriteLine(String.Join(",",cities_with_n));
Console.WriteLine("The first city that starts with the letter 'A'");
var cities_start_a = cities.FirstOrDefault(c => c.StartsWith("A"));
Console.WriteLine(String.Join(",",cities_start_a));
When attempting to extract a single result from a collection, there are a few different LINQ expressions that can be used.
First() - Returns first item in the collection where the lambda expressions returns true. If no items in collection return true, exception is thrown.
Last() - Returns last item in the collection where the lambda expression returns true. If no items in the collection return true, exception is thrown.
Single() - Returns the single item in the collection where lambda collection is true. If there is no item in the collection that matches the or if there are more than one item in the collection that meets the criteria, an exception is thrown.
FirstOrDefault, LastOrDefault, SingleOrDefault - Follows the descriptions above, but returns null rather than throw an exception under various cases.
LINQ in ESAPI
In the following examples, LINQ will be utilized to explore various collections extracted via the Eclipse Scripting API.
Collections within a Patient
For this first example, a single file plugin will be created in order to search through the various types of collections available in the patient.
Example 1: Accessing all PTVs, GTVs, and CTVs within a structure set.
In this first example, the target structures are extracted using the Where LINQ expression. Each structure is inspected on its DicomType property to see if the DicomType contains "TV" as in PTV, CTV, or GTV.
public void Execute(ScriptContext context /*, System.Windows.Window window, ScriptEnvironment environment*/)
{
// TODO : Add here the code that is called when the script is launched from Eclipse.
string message = String.Empty;
var structureSet = context.StructureSet;
var targets = structureSet.Structures.Where(st => st.DicomType.Contains("TV"));
message += String.Format("Target structures: {0}\n", String.Join(",", targets));
MessageBox.Show(message);
}
The resultant MessageBox shows all structures where the DICOMType contains "TV", but the formatting is unexpected. By default, Structure objects cast as strings will have the format "ID:Name". Another LINQ expression can be nested into the code to specify the formatting. Updating the targets variable to the following line will now show the structure Ids only in the MessageBox.
var targets = structureSet.Structures.Where(st => st.DicomType.Contains("TV")).Select(st=>st.Id);
Example 2: Determination of 2D image slices in a CT series.
In the next example, The LINQ Query Count is used to count the number of images in a series where the size of the image is 1 in the Z-direction. This leads a result of the number of slice images in the series.
var series = context.Image.Series;
var imageCount = series.Images.Count(i => i.ZSize == 1);
message += String.Format("CT Images: {0}\n", imageCount);
Filtering Patients with LINQ
In the final examples, LINQ will be used to filter patients and data in a data mining approach. In order to mine data through ESAPI, a stand-alone executable will be created using the Eclipse Script Wizard. Within the Execute method of the stand-alone executable template, all patients can be accessed with the following code.
static void Execute(Application app)
{
// TODO: Add your code here.
foreach(var summary in app.PatientSummaries)
{
Console.WriteLine($"{summary.LastName}, {summary.FirstName} ({summary.Id})");
}
Console.ReadLine();
}
This is unrealistic for clinical databases that may have a large volume of patients. Instead of inspecting every patient in a database, LINQ queries can be used to sort data into more manageable groups. In the following example, the latest 100 patients by creation date are extracted.
foreach(var summary in app.PatientSummaries
.OrderByDescending(ps=>ps.CreationDateTime)
.Take(100))
{
Console.WriteLine($"{summary.LastName}, {summary.FirstName} ({summary.Id})");
}
In the final example, I would like to extract all the SBRT plans. First, we will create a simple PlanModel to hold the information from the plan that we want to relay to the user.
public class PlanModel
{
public string PatientId { get; set; }
public string CourseId { get; set; }
public string PlanId { get; set; }
public PlanModel(PlanSetup plan)
{
PatientId = plan.Course.Patient.Id;
CourseId = plan.Course.Id;
PlanId = plan.Id;
}
}
Within the loop through the PatientSummaries, we will first open the patient and loop through the courses. Then, a Where LINQ expression will have 2 boolean conditions linked together checking both the DosePerFraction and NumberOfFractions to check if the plan is an SBRT plan. SBRT plans are added to our PlanModel list. Finally, the Select LINQ statement is used to extract and format a string from our collection and write it to the console.
List<PlanModel> SBRTplans = new List<PlanModel>();
foreach (var summary in app.PatientSummaries
.OrderByDescending(ps => ps.CreationDateTime)
.Take(100))
{
//Console.WriteLine($"{summary.LastName}, {summary.FirstName} ({summary.Id})");
Patient patient = app.OpenPatient(summary);
foreach(var course in patient.Courses)
{
var plans = course.PlanSetups
.Where(ps => ps.DosePerFraction.Dose > 500
&& ps.NumberOfFractions <= 5);
foreach(var plan in plans)
{
PlanModel planModel = new PlanModel(plan);
SBRTplans.Add(planModel);
}
}
app.ClosePatient();
}
Console.WriteLine(String.Join("\n",SBRTplans
.Select(p=>$"Patient: {p.PatientId} Course/Plan: {p.CourseId}/{p.PlanId}")));
Console.ReadLine();
Summary
LINQ is a powerful API to explore data collections to extract data or to inspect collections for specific conditions. Its also a fun way to write code and behaves sometimes like a puzzle to solve when attempting to get the query just right. Check out LINQ to enhance your code's clarity, readability, and maintainability.
Comments