I was recently asked by a friend of mine what would be the best way to mask data in SQL Server. Now you may be aware that dynamic data masking was introduced in Microsoft SQL server version 2016. However for older versions of SQL server there are primarily just a few variations of same approach when it comes to masking data. The most common and easiest way to achieve this is just script out the data in a view and expose the view to end users or obfuscate the data while being fetched by a procedure.
Before we can talk about masking data it is important to understand when to mask data versus when to encrypt it. Typically you would mask information where you need to validate a portion of the information without accessing it completely. Example of this would be to mask the digits of the credit card so that you can verify the last 4 digits. At the same time you would still encrypt the CVV number. Another example would be masking email ID but encrypting the password for a log in. In either case the lines between went to masking and encrypting are blurring with recent advances. Encryption has a significantly higher overhead compared to masking. Therefore for high volume transactions masking is often considered advantageous over encryption. However this decision is often dictated by sensitivity and criticality of the data itself and therefore even though you might have high volume transactions there are scenarios where you would consider encryption over masking. Personally my preference is always refer encryption as masking is never foolproof. Having said that neither is encryption however in terms of effort, encryption will last longer than masking. Now let’s get into the different approaches with regard to masking.
To start off let’s explore permissions. One way that you can prevent unwanted access to sensitive information is by explicitly denying select permission on the column. This way non permissioned accounts when trying to access the column would encounter an error and therefore prevent access entirely. In the event access to this column is required, permissions are elevated within an application role. However this approach isn’t technically masking it is purely denying access to a column until an elevated privilege has been obtained. The below script demonstrates an example of this approach. Having said that let’s move on to actual masking.
Create table MaskMyData ( id int identity(1,1), Customername varchar(100) ) go insert into MaskMyData select 'Jayanth' UNION ALL select 'James' UNION ALL select 'Jayesh' GO CREATE USER [NoAccess] WITHOUT LOGIN; GO GRANT SELECT ON [dbo].[MaskMyData] ([id]) TO [NoAccess] GO EXECUTE AS USER = 'NoAccess'; select * from MaskMyData REVERT; GO EXECUTE AS USER = 'NoAccess'; select id from MaskMyData REVERT; GO
One of the easiest way to mask data is to add logic within the body of views or procedures to explicitly mask columns as part of the output. This approach prevents access by restricting the amount of information that will be presented as the output of the view or the procedure. This approach only works in scenarios where all data access is conducted using the specific view or procedure. Another limitation of this approach is the fact that it is much easier to implement during the early stages of development. At this stage you have much more control over how the data access is actually performed rather than trying to build it retrospectively. The below example demonstrate how to achieve this effect.
CREATE VIEW MaskCustomers AS SELECT id , left(CustomerName, 2)+'XXXXXXX' as CustomerName FROM MaskMyData GO select * from MaskCustomers
In scenarios where you’re trying to implement masking retrospectively it is probably easier to expose the data as a computed column rather than via a view. The idea being that the table is already in use within multiple pieces of code and for a fairly large application it might be difficult to identify all the moving parts and retrospectively fix them. In such cases you can create a function that will mask the data and then refer that function within a computed column followed by renaming the computed column as needed. This way all table that access the data continues to remain as is including column names. However due to the use of the scalar function to individually mask data coming from each row there might be a performance implication for highly transactional systems . This needs to be taken into account when designing the system. An example of this approach is shown below.
Create function dbo.Masker (@Input varchar(100)) Returns VARCHAR(100) AS BEGIN declare @Masked varchar(100) set @Masked= LEFT(@input,2) RETURN @Masked END GO ALTER TABLE MaskMyData ADD MasedCName as dbo.Masker(Customername) GO SELECT * FROM MaskMyData