Data masking – a way forward for analytics?
Duncan McKay, Business Development Manager at PBT Group
While not widespread yet, data masking is gaining momentum in the corporate environment as an effective strategy to address the regulatory requirements of protecting personal information.
This is a method of creating a structurally similar but inauthentic version of the data at a company to be used for analytical purposes. As the name suggests, it masks personal identifiers on an individual record level. It is especially advantageous to quantitative analysts who build new machine learning models to optimise and automate operational decision-making.
In my experience, many large organisations are finding it difficult to govern what employees are doing with personal data, given how analysts are scattered throughout the company. As such, these businesses are looking at either the masking or tokenisation of data. So, while masking obfuscates the data making it impossible to link back to the original person, tokenisation swaps out sensitive data with a randomised data value in the same format but with no intrinsic value of its own.
This makes it an ideal way for companies to build customer behaviour models without knowing who the individual is.
The primary benefit of data masking or tokenisation is that it is designed to protect personal information. For example, a company might contract a data analysis firm to create analytical models while not being able to see the content of its data. If data masking is employed, the contracted developers will have access to an environment where they can use data that resembles the live data and not risk abusing the intellectual property of the client.
However, being able to identify patterns in data to still resemble live production data makes this an extremely complex process. There is a significant amount of intelligence required to ensure none of the value behind the information gets lost.
Furthermore, it introduces a new step in the data lifecycle. This means data is captured and analysed, and then algorithms are developed to obfuscate it. Depending on the size of the project, this could have a significant impact on the timeline. It could potentially introduce new risks and pitfalls if the sample size of data that the company prepares for masking is not effectively represented.
Because of this complexity, data masking is not prevalent in the local industry yet. Many organisations still prefer to go with the more accessible tokenisation method, which protects against cyber-risk, but potentially sacrifices the protected data’s analytical value.
Companies might only realise the value of masking and tokenisation if they experience a cyber security incident that impacts the quality of their data. This could be the catalyst required to start pushing companies into that direction. And as that happens, it will be prudent to scrutinise the privacy security strategies beyond mere operational security as provided by tokenisation. Companies must be sure to consider privacy approaches that also preserve the data’s analytical insight structure, such as indicated through data masking offerings.
Even though it is difficult to predict how quickly data masking will start being adopted, the potential of reinventing organisational systems in a post-lockdown world might give it the impetus needed to start gaining some traction and to start being used more often to the benefit of the overall analytics process.