The recent discussion on how social media and Big Data is being misused to profile and categorize users is very serious. However most people don’t seem to grasp the seriousness or really care about it because they feel it doesn’t affect them. Hopefully with this post we can shed some light on the topic.
What is Aadhaar database?
It is a database of Email ID, phone number, name, address, retinal scan and fingerprint of users. Out of this only the address requires actual verification the rest are simply checked for duplicates in the system. Phone number based OTP has started since the income tax department has made it mandatory for users to link PAN number with Aadhaar card. The data captured above is stored in MongoDB and MySQL database and rumored to be about 5 PB.
What is the big deal?
Previously we had a number of different IDs being used to track and record our activities. E.g. phone number was only useful in telecom or healthcare. Account number was only useful in Banks, PAN number was good for IT returns. However since its implementation Aadhaar has linked all of these to create a network that can be used to better understand and profile customers, think very large scale graph databases. This is very much similar to how Cambridge Analytica used Facebook to profile voters. Facebook linked a number of different usage patterns and gave the campaign access to information that was not even related to Facebook to begin with.
Imagine you’re forced to use the same password to login to different websites. Aadhaar linked a number of independent sites which could be compromised individually and put it in one basket.
At least the Govt. is making sure the data is protected?
Sure we can assume the data is being protected but for security reasons they can’t tell us how. Which is obviously reasonable. So let’s look at the technologies being used.
To start off let’s explore the company Cross Match. This company was certified by UIDAI to perform fingerprint scan and retinal scan on applicants. Press release below.
The below screenshot is from Wikileaks where the CIA upgraded the hardware to forward the details to their server as well.
But that’s just one company, well in that case please consider Mongo DB which is funded by the CIA
But surely that isn’t a real problem. Well it is!! The bigger the database the more attractive it is to State sponsored actors. WannaCry – North Korea anyone? In the interest of not sounding like a conspiracy theorist let’s assume this wasn’t the biggest problem. Let’s assume the CIA has better things to do. The fact that Aadhaar uses Open Source software is still a problem.
Open source by its very nature allows anybody to view and access the underlying source code and therefore find and misuse potential flaws in the system. HeartBleed is probably the best example here.
Ok so maybe the data at the source might not be as secure as we think. But that can be fixed easily at the source. Not exactly, the whole point of the Aadhaar database is to act as a way to validate user credentials which means allowing access to external vendors to call the database and perform verification. Without the adequate cyber laws in place the system is prone to misuse and this is what most people hear in the news with respect to Aadhaar database.
For example after the recent expose on Indane gas not securing Aadhaar data collected by it caused this reaction from Aadhaar Team
The claim here is that Aadhaar database itself is secure but the breach happened at the client side not the server side. As a DBA this is absolute rubbish. When a SQL injection attack happens the DBA doesn’t blame the application developer for not validating the front end. Sure he should have but the DBA does share responsibility. This is like the Bank saying the money was safe as long as it was in our vault it got robbed while we were loading the ATMs so the problem is with NCR (the manufacturer of the ATM) not the bank.
If you look at the last tweet you will see the problem. Aadhaar claims that if bank accounts are compromised by Indane then should we assume bank databases have been breached. Worst case yes. Why? Because it’s linked to our Aadhaar and while our bank account on its own is meaningless the combination of Bank account + Aadhaar is likely to be misused by simply calling the Banks call center. The fact is my finances would never have been at risk if it wasn’t for Aadhar linkage and even if it was compromised it would have been assigned to a competent authority like a Bank and not a Govt Entity with a Devil may care attitude to security.
DBAs are extremely diligent about how their data is secured and a poor implementation such as what has been implemented puts us all to shame. We are the custodian of the data END to END so the buck stops with us. For such a large scale and important database it seems the best practices were not followed when enforcing security at the client or server side. Challenge based Auth, Anonymized data, token based data transfer, one time ciphers are just a few of the option that could have been considered to secure the client. One more point I would like to make is a claim of 2048 bit encryption being used to encrypt data. Sure that is great and even the best Quantum Computer is generations away from being ready to break it. But this doesn’t mean anything if the end of the pipe is open to misuse.
So what should we do next?
Unfortunately there isn’t much we can do. We have already forgone our right to Privacy either via Aadhaar or by Facebook. Simply put you are safe if you can drop off the face of the earth else you’re simply down for the ride with no say where we end up. Due diligence on our part when it comes to sharing Aadhaar number doesn’t make sense because it’s linked to everything. So much like how SSN in the US is not a prime source of identity theft we too should expect to face similar challenges in future. Much like how Airtel Payments Bank started accounts on behalf of its customers but forgot to tell them. One of the key problems I think we need to overcome is our inability to say Mea Cupla and a Govt that’s too afraid to admit it messed up. GST website and database is another example of good ideas gone badly. But that post for another day.