Recently I was given a list of patients and asked to add a new column to the list containing each patient’s entire set of secondary diagnoses. All secondary diagnoses needed to appear in a single column on the same row with a comma between each distinct diagnosis.
This is a common, real world request faced by professional data analysts, so I thought the solution would be a good topic to write about.
The first challenge is that the data to be added is natively stored in columns but needs to transformed into a single concatenated row.
The second challenge is that the source data is one-to-one (that is, there is one record for each patient), but the diagnoses data is one-to-many (that is, there are multiple records for each patient).
There are several ways to achieve the desired outcome in SQL Server (such as COALESCE, Union All and PIVOT), but the most elegant solution is to use the XML directive in PATH mode as a subquery in the field where the concatenated results should populate.
This is the syntax:
[FIELD_NAME] = (
SELECT Some_Field_Stored_In_Columns + ', ' as 'data()' FROM Table_With_The_Columnar_Data AS "ColumnTable" WHERE ColumnTable.Some_Common_identifier = RowTable.Some_Common_Identifier FOR XML PATH('YourConcatenatedString')
)
This automatically transposes your columnar data into a concatenated string. It works by converting your discrete columns into a single attribute inside an XML row element named “YourConcatenatedString”. Naming your columnar field “data()” is what directs it do so.
The subquery will return results formatted like this:
[FIELD_NAME] = "Value1, Value2, Value3, Value4, "
The + ', '
portion is not strictly necessary. It just adds a delimiter between the values. If you omit it, your data will come back like this:
[FIELD_NAME] = "Value1 Value2 Value3 Value4 "
Happy data crunching.