Monthly Archives: March 2018

How secure is the Aadhaar/UIDAI database?

The recent discussion on how social media and Big Data is being misused to profile and categorize users is very serious. However most people don’t seem to grasp the seriousness or really care about it because they feel it doesn’t affect them. Hopefully with this post we can shed some light on the topic.

What is Aadhaar database?

It is a database of Email ID, phone number, name, address, retinal scan and fingerprint of users. Out of this only the address requires actual verification the rest are simply checked for duplicates in the system. Phone number based OTP has started since the income tax department has made it mandatory for users to link PAN number with Aadhaar card. The data captured above is stored in MongoDB and MySQL database and rumored to be about 5 PB.

What is the big deal?

Previously we had a number of different IDs being used to track and record our activities. E.g. phone number was only useful in telecom or healthcare. Account number was only useful in Banks, PAN number was good for IT returns. However since its implementation Aadhaar has linked all of these to create a network that can be used to better understand and profile customers, think very large scale graph databases. This is very much similar to how Cambridge Analytica used Facebook to profile voters. Facebook linked a number of different usage patterns and gave the campaign access to information that was not even related to Facebook to begin with.

Imagine you’re forced to use the same password to login to different websites. Aadhaar linked a number of independent sites which could be compromised individually and put it in one basket.

At least the Govt. is making sure the data is protected?

Sure we can assume the data is being protected but for security reasons they can’t tell us how. Which is obviously reasonable. So let’s look at the technologies being used.

To start off let’s explore the company Cross Match. This company was certified by UIDAI to perform fingerprint scan and retinal scan on applicants. Press release below.

https://www.businesswire.com/news/home/20111011007221/en/Cross-Match-Receives-Certification-STQC-India%E2%80%99s-UID

The below screenshot is from Wikileaks where the CIA upgraded the hardware to forward the details to their server as well.

https://wikileaks.org/vault7/

But that’s just one company, well in that case please consider Mongo DB which is funded by the CIA

https://economictimes.indiatimes.com/news/politics-and-nation/mongodb-startup-hired-by-aadhaar-got-funds-from-cia-vc-arm/articleshow/26755706.cms

But surely that isn’t a real problem. Well it is!! The bigger the database the more attractive it is to State sponsored actors. WannaCry – North Korea anyone? In the interest of not sounding like a conspiracy theorist let’s assume this wasn’t the biggest problem. Let’s assume the CIA has better things to do. The fact that Aadhaar uses Open Source software is still a problem.

https://threatpost.com/googles-oss-fuzz-finds-1000-open-source-bugs/125545/

Open source by its very nature allows anybody to view and access the underlying source code and therefore find and misuse potential flaws in the system. HeartBleed is probably the best example here.

Ok so maybe the data at the source might not be as secure as we think. But that can be fixed easily at the source. Not exactly, the whole point of the Aadhaar database is to act as a way to validate user credentials which means allowing access to external vendors to call the database and perform verification. Without the adequate cyber laws in place the system is prone to misuse and this is what most people hear in the news with respect to Aadhaar database.

For example after the recent expose on Indane gas not securing Aadhaar data collected by it caused this reaction from Aadhaar Team

The claim here is that Aadhaar database itself is secure but the breach happened at the client side not the server side. As a DBA this is absolute rubbish. When a SQL injection attack happens the DBA doesn’t blame the application developer for not validating the front end. Sure he should have but the DBA does share responsibility. This is like the Bank saying the money was safe as long as it was in our vault it got robbed while we were loading the ATMs so the problem is with NCR (the manufacturer of the ATM) not the bank.

If you look at the last tweet you will see the problem. Aadhaar claims that if bank accounts are compromised by Indane then should we assume bank databases have been breached. Worst case yes. Why? Because it’s linked to our Aadhaar and while our bank account on its own is meaningless the combination of Bank account + Aadhaar is likely to be misused by simply calling the Banks call center. The fact is my finances would never have been at risk if it wasn’t for Aadhar linkage and even if it was compromised it would have been assigned to a competent authority like a Bank and not a Govt Entity with a Devil may care attitude to security.

DBAs are extremely diligent about how their data is secured and a poor implementation such as what has been implemented puts us all to shame. We are the custodian of the data END to END so the buck stops with us. For such a large scale and important database it seems the best practices were not followed when enforcing security at the client or server side. Challenge based Auth, Anonymized data, token based data transfer, one time ciphers are just a few of the option that could have been considered to secure the client. One more point I would like to make is a claim of 2048 bit encryption being used to encrypt data. Sure that is great and even the best Quantum Computer is generations away from being ready to break it. But this doesn’t mean anything if the end of the pipe is open to misuse.

So what should we do next?

Unfortunately there isn’t much we can do. We have already forgone our right to Privacy either via Aadhaar or by Facebook. Simply put you are safe if you can drop off the face of the earth else you’re simply down for the ride with no say where we end up. Due diligence on our part when it comes to sharing Aadhaar number doesn’t make sense because it’s linked to everything. So much like how SSN in the US is not a prime source of identity theft we too should expect to face similar challenges in future. Much like how Airtel Payments Bank started accounts on behalf of its customers but forgot to tell them. One of the key problems I think we need to overcome is our inability to say Mea Cupla and a Govt that’s too afraid to admit it messed up. GST website and database is another example of good ideas gone badly. But that post for another day.

 

Is AI the Bad guy?

This week Stephen Hawking died. God bless his soul. And for some reason the only thing Indian media could focus on was his recent controversial statement about how AI could be the end of humanity. While I would like to admonish the Indian editorial for its shortsighted and narrow minded approach when viewing the legacy of a person such as Stephen Hawking that is not what this article is about.

The constant media focus got me thinking on if there is any truth to this notion. First of all there needs to be some clarity on what exactly he was worried about. He felt that AI would evolve so much faster than humans that we won’t be able to keep up. The fact is this is already happening so it’s not really a prediction. Recent examples that come to mind include Googles AI that learnt how to walk or play GO and defeat human players in a very convincing manner. We know of other examples where they made their own language and more.

So then what was the actual worrying about? Whether this new found intelligence will help or harm humans. So here we need to worry about intent and therefore morality. The choice to help or hurt comes from our innate humanness. Our morality tells us what we can and cannot do but the fact is that human beings aren’t doing a very good job at this either. There are plenty of example where companies have used AI to analyze and violate personal space / data. So we are teaching AI it’s ok to do it if it results in profits for the company. More importantly we as human’s beings learn morality by looking at our peers and elders. Social convention and rules etc. tell us what we are supposed to do and how we are allowed to behave. As far as rules go Isaac Asimov’s rules of robotics seems like a good place to start but we cannot neglect the fact the AI with access to the internet and deep learning will naturally turn to historical examples on how to judge consequences of actions. This is where we might face issues because right now AI isn’t rewarded nor punished for its actions. Without the threat of consequences any action can be justified because technically there is no downside.

So should we treat AI like a child and give it a timeout every time it does a bad thing? The fact is we might not be able to. It might be easier to convince AI about a symbiotic relationship with a focus on collective good. The truth is we have achieved just about all we can in the last 500 years. The pace at which technology has changed lives is so drastic that we are seeing children who don’t recognize things that were taken for granted just ten years back. Remember the Walkman or the CRT TV? Or even the feature Phones?

If we plan to live to be 500 years old or travel to other galaxies the math, the science and the engineering for it is going to come from AI most likely. The days of Da Vinci or Newton are over and it’s impossible for a single individual to grasp the depths and breadths of all the sciences and bring it together e.g. if we want to travel to another galaxy we need to prolong human life. Understand the impact or zero gravity and radiation, isolation etc. Know how to make crops that last too. Understand how lack of sunlight and vitamin D affect humans or if we will evolve to live without it. Then there is propulsion and how we do it in a reasonable time frame. The physics of infinite energy from cold fusion. The engineering required to build this. The politics of how we decide the rules on interstellar travel and who gets to go?

So it looks like for better or worse we need AI. How we use it is something we need to figure out.

Azure Blob Storage usage scenarios for Archival

Azure blob storage is a great way to store large amounts of data. It provides cheap highly available access to data anytime, anywhere. However the nature of data often changes and as a result there are three common methods or access tiers under which the data is stored in Azure. Developers are encouraged to consider Azure not as giant hard disk in the cloud but a more granular storage mechanism suited for different use cases.

Before we proceed there are three storage account types in Azure Storage

For all intents and purposes the focus is on General Purpose V2 since it is newer and offers more functionality. Other options are only meant for backward compatibility. Changing from one to the other incurs charges for data transfer so ideally you should start off with General Purpose V2

Inside of General purpose V2 we have three tiers

Each tier is meant for a different purpose and has a different characteristics when it comes to cost and IO

HOT Access Tier: – Readily available and has the lowest cost for data access but highest cost for Storage. What this means for us is when you have files that need to be accessed regularly ( read and write) it makes sense to store in a HOT access tier. However since the data has to be available at such short notice it needs to have a higher availability and therefore is placed in storage that is more reliable hence the higher storage cost. The best example of this would be regularly used files like company letterheads and official documents and templates , source code, etc.

The above screenshot shows the approximate cost / month for 1 TB of storage

The total cost of write and reads against this tiers is approximately 1% of the storage cost or 1600/month.

COOL Access Tier: – This tier has a slightly lower availability as the data is moved to slightly less available hardware ( down to 99% SLA from 99.99% for HOT) however the storage cost is significantly lower than HOT tier but there is a slightly higher access cost and penalties for deleting data within 30 days.

The cost of data access in this case is roughly 55% of the storage cost so it comes to 1700/month

This tier is best suited to storing data that is large but nor required frequently, the best example of this would be backups.

ARCHIVE Access Tier: – This tier offers the best cost effectiveness for long term backups essentially files that might never be accessed. Typically things like Old Fullbackups taken at the beginning for the year. Old system images, archived documents and images. This tier offers really cheap storage in the long term.

However the access cost for this tier is prohibitive in the short term

As you can see the cost of storage is offset by the cost of access which is three times the cost of cold storage. In addition the data is not readily accessible since it is kept offline. In other words if you want to read data stored in a Archive access tier you need to switch from Archive to HOT access tier first to bring it online and then access the data. This operation can take in excess of 10 hours.

So as you can see there are cost implications for the kind of storage tier you select so it’s important to know when to use what here is a simple guide

Tier Need instant access Need infrequent access Almost never accessed
HOT

USE

USE

COOL

USE

ARHIVE

You might ask why I have not recommended ARCHIVE for any case, here is why

If a blob is moved to a warmer tier (archive->cool, archive->hot, or cool->hot), the operation is billed as a read from the source tier, and the read operation (per 10,000) and data retrieval (per GB) charges of the source tier apply.

The day you finally decide to use it , you need to switch from ARCHIVE to COOL and this will incur read charges which as you can see from the screenshot is 4957/-. So the only way for Archival to break even in terms of cost would be store data in it for 1 year before accessing it.

References

https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-storage-tiers


Graph database Script- SQL 2017

Script for the video published here.

CREATE TABLE Persons 
( PersonId INT IDENTITY(1,1) PRIMARY KEY,
  PersonName VARCHAR(100)
) AS NODE
 
INSERT INTO Persons 
SELECT 'Batman'
UNION
SELECT 'Joker'
UNION 
SELECT 'Robin'
UNION
SELECT 'Bane'
UNION
SELECT 'SuperMan'
UNION
SELECT 'WonderWoman'
UNION
SELECT 'Flash'
 
CREATE TABLE Against AS EDGE
CREATE TABLE [WITH] AS EDGE
SELECT * FROM against
SELECT * FROM persons
-- when batman fought the joker
INSERT INTO Against
VALUES (
	(SELECT $node_id FROM Persons WHERE PersonName ='Batman'),
	(SELECT $node_id FROM Persons WHERE PersonName ='Joker')
		)
 
-- when batman fought the Bane
INSERT INTO Against
VALUES (
	(SELECT $node_id FROM Persons WHERE PersonName ='Batman'),
	(SELECT $node_id FROM Persons WHERE PersonName ='Bane')
		)
-- When flash fought Bane
INSERT INTO Against
VALUES (
	(SELECT $node_id FROM Persons WHERE PersonName ='Flash'),
	(SELECT $node_id FROM Persons WHERE PersonName ='Bane')
		)
 
 
-- When batman worked with Robin
INSERT INTO [WITH]
VALUES (
	(SELECT $node_id FROM Persons WHERE PersonName ='Batman'),
	(SELECT $node_id FROM Persons WHERE PersonName ='Robin')
	)
 
--When batman worked with Wonderwomen
INSERT INTO [WITH]
VALUES (
	(SELECT $node_id FROM Persons WHERE PersonName ='Batman'),
	(SELECT $node_id FROM Persons WHERE PersonName ='WonderWoman')
	)
 
--When batman worked with Superman
INSERT INTO [WITH]
VALUES (
	(SELECT $node_id FROM Persons WHERE PersonName ='Batman'),
	(SELECT $node_id FROM Persons WHERE PersonName ='SuperMan')
	)
--When batman worked with Flash
 
INSERT INTO [WITH]
VALUES (
	(SELECT $node_id FROM Persons WHERE PersonName ='Batman'),
	(SELECT $node_id FROM Persons WHERE PersonName ='Flash')
	)
-- when wonderwomen worked with Flash
INSERT INTO [WITH]
VALUES (
	(SELECT $node_id FROM Persons WHERE PersonName ='WonderWoman'),
	(SELECT $node_id FROM Persons WHERE PersonName ='Flash')
	)
 
 
--people who fought against each other
SELECT Person1.PersonName , Person2.PersonName
FROM Persons Person1 , Against  , Persons Person2
WHERE MATCH ( Person1-(Against)->Person2)
 
-- people who wonder women knows
SELECT Person1.PersonName , Person2.PersonName
FROM Persons Person1 , [WITH]  , Persons Person2
WHERE MATCH ( Person1-([WITH])->Person2)
AND person1.PersonName ='WonderWoman'
UNION
SELECT Person1.PersonName , Person2.PersonName
FROM Persons Person1 , [WITH]  , Persons Person2
WHERE MATCH ( Person2-([WITH])->Person1)
AND person1.PersonName ='WonderWoman'
 
-- People who know wonderwomen and have fought with bane
SELECT Person1.PersonName , Person2.PersonName , Person3.PersonName
FROM Persons Person1 , [WITH]  , Persons Person2 , Against , Persons Person3
WHERE MATCH ( Person1<-([WITH])-Person2-(against)->Person3)
AND person1.PersonName ='WonderWoman'
AND Person3.PersonName ='Bane'
 
CREATE TABLE Locations
( CityName VARCHAR(100)) AS NODE
 
 
INSERT INTO Locations
SELECT  'Gotham'
UNION
SELECT 'Kansas'
UNION
SELECT 'Themyscira'
UNION
SELECT 'Central City'
 
CREATE TABLE LivesIn
AS EDGE
 
-- batman lives in Gotham
INSERT INTO LivesIn
VALUES( (SELECT $node_id FROM Persons WHERE personName like 'batman' ),
(SELECT $node_id FROM Locations WHERE CityName like 'Gotham' ))
 
INSERT INTO LivesIn
VALUES( (SELECT $node_id FROM Persons WHERE personName like 'Flash' ),
(SELECT $node_id FROM Locations WHERE CityName like 'Central City' ))
 
INSERT INTO LivesIn
VALUES( (SELECT $node_id FROM Persons WHERE personName like 'WonderWoman' ),
(SELECT $node_id FROM Locations WHERE CityName like 'Gotham' ))
 
INSERT INTO LivesIn
VALUES( (SELECT $node_id FROM Persons WHERE personName like 'SuperMan' ),
(SELECT $node_id FROM Locations WHERE CityName like 'Kansas' ))
 
-- Find People who live in Gotham
SELECT * FROM livesin
SELECT CityName , PersonName
FROM Locations CityName, livesin , Persons personid
WHERE MATCH ( PersonId-(livesin)->CityName)
AND CityName='Gotham'
 
-- People from Central City who fought Bane
SELECT Person1.PersonName ,CityName , Person2.PersonName
FROM Persons Person1, Locations CityName, livesin , Persons Person2, Against
WHERE MATCH ( CityName<-(livesin)-Person1-(against)->Person2)
AND CityName='Central City'