I was reading a blog about how to write procedural code today ( link below) and wanted to take a moment to explore another approach just to provide a more complete reference. To provide some context I am explaining the business case here.
A hospital wants to identify people whose heights have changed over time in order to detect if the person is who they say they are. Since the height of a person doesn’t change after a certain age the approach is used to detect identity theft. The requirement is that we compare the heights logged during different visit and flag any patients where they have different heights logged.
The code to create the dummy data is mentioned below and was taken from the link mentioned earlier.
SET NOCOUNT ON; CREATE TABLE Dates (ID INT, VisitDate DATETIME); --populate table with 20 visit dates DECLARE @i INT, @startdate DATETIME; SET @i = 1; SET @startdate = GETDATE(); WHILE @i <= 20 BEGIN INSERT Dates (ID, VisitDate) VALUES (@i, @startdate); SET @startdate = DATEADD(dd,7, @startdate); SET @i = @i+1; END CREATE TABLE PatientHeight (PatientID INT NOT NULL ,Height INT); -- populate table with 1000 patientids with heights between 59 and 74 inches SET @i = 1; WHILE @i <= 10000 BEGIN INSERT PatientHeight (PatientID, Height) VALUES (@i, @i % 16 + 59); SET @i = @i+1; END ALTER TABLE PatientHeight ADD CONSTRAINT PK_PatientHeight PRIMARY KEY(PatientID); -- cartesian join produces 200,000 PatientVisit records SELECT ISNULL(PatientID, -1) AS PatientID, ISNULL(VisitDate, '19000101') AS VisitDate, Height INTO PatientVisit FROM PatientHeight CROSS JOIN Dates; ALTER TABLE PatientVisit ADD CONSTRAINT PK_PatientVisit PRIMARY KEY(PatientID, VisitDate); -- create changes of height SET @i = 3; WHILE @i < 10000 BEGIN UPDATE pv SET Height = Height +2 FROM PatientVisit pv WHERE PatientID = @i AND pv.VisitDate = (SELECT TOP 1 VisitDate FROM Dates WHERE id = ABS(CHECKSUM(@i)) % 19); SET @i = @i + 7; END /* -- return AdventureWorks to its previous state when you are finished -- with this example. DROP TABLE Dates; DROP TABLE PatientHeight; DROP TABLE PatientVisit; */
The first approach uses a cursor to iterate row by row and identify if there has been a change in height. The second approach uses a ranking function to perform a self-join based on patient id to identify any change. Both these approaches are detailed below as well to save the reader time.
CUROSR BASED APPROACH
-- CURSOR BASED APPROACH CREATE TABLE #Changes ( PatientID INT , VisitDate DATETIME , BeginHeight SMALLINT , CurrentHeight SMALLINT); DECLARE @PatientID INT , @CurrentID INT , @BeginHeight SMALLINT , @CurrentHeight SMALLINT , @VisitDate DATETIME; SET @PatientID = 0; DECLARE Patient_cur CURSOR FAST_FORWARD FOR SELECT PatientID , VisitDate , Height FROM PatientVisit ORDER BY PatientID ,VisitDate; OPEN Patient_cur; FETCH NEXT FROM Patient_cur INTO @CurrentID, @VisitDate, @CurrentHeight; WHILE @@FETCH_STATUS = 0 BEGIN -- first record for this patient IF @PatientID <> @CurrentID BEGIN SET @PatientID = @CurrentID; SET @BeginHeight = @CurrentHeight; END IF @BeginHeight <> @CurrentHeight BEGIN INSERT #Changes ( PatientID , VisitDate , BeginHeight , CurrentHeight) VALUES (@PatientID , @VisitDate , @BeginHeight , @CurrentHeight); SET @BeginHeight = @CurrentHeight; END FETCH NEXT FROM Patient_cur INTO @CurrentID, @VisitDate, @CurrentHeight; END CLOSE Patient_cur; DEALLOCATE Patient_cur; SELECT * FROM #Changes DROP TABLE #Changes GO
SET BASED APPROACH
-- RANKING FUNCTION BASED APPROACH WITH PV_RN AS ( SELECT ROW_NUMBER() OVER (ORDER BY PatientID, VisitDate) AS ROWID, * FROM PatientVisit ) SELECT t1.PatientID ,t2.VisitDate AS DateChanged ,t1.Height AS HeightChangedFrom ,t2.Height AS HeightChangedTo FROM PV_RN t1 JOIN PV_RN t2 ON t2.ROWID = t1.ROWID + 1 WHERE t1.patientid = t2.patientid AND t1.Height <> t2.Height ORDER BY t1.PatientID, t2.VisitDate;
There is however a third option which I believe would be much simpler and is also much faster has been listed below. Here we use the math operator STDEV which calculates a standard deviation for the heights that were captured. When the height hasn’t changed STDEV will return a value of zero else a non-zero number which can be used to identify patient’s with different heights.
-- MATH OPERATOR BASED APPROACH -- HAS THE ADVANTAGE OF PROVIDING A MARGIN FOR ERROR FOR MEASUREMENTS AS WELL SELECT PatientID , MIN(Height) ,MAX(Height) FROM PatientVisit WHERE PatientID in ( SELECT patientid FROM PatientVisit GROUP BY PatientID HAVING STDEV(height) > 0) GROUP BY PatientID
Hope this approach helps clarify what we are doing with set based approach since recently I have noticed that ranking functions and CTEs seem to be the go to approach for most developers when they think set based logic.