Monthly Archives: October 2014

When does a machine become human? When they can Lie…

Don’t worry I am not talking of androids or cyborgs. I am talking about the classical definition used to determine artificial intelligence. The most widely used benchmark or at least the most well know is

  1. The Turing test: is a test of a machine’s ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human.
  2.  

    The key take away here is the fact that artificial intelligence should be indistinguishable from that of a human. Now computers such as Watson from Big Blue etc. have long proven that we have the technological capability to create a system fast enough to browse the internet and return the correct or most logical answer 9 out of 10 times. However, we neglect a few other things that make us human. One of the most common examples is emotion.

    To tell you how many people died of cancer last year and to say it with genuine concern are two different things. The latter not mastered even by some human beings.

    This might be something we cannot achieve since I believe that is what really makes us Human. However, there is another aspect about human nature that I think we can program computers to do and that is ‘to lie’. Like any BI system, Watson can crunch the numbers and tell us what happened or did not happen. The game changer here is when we come close to self-awareness i.e. when computers can lie for their own self-interest.

    In order to do this all, we need is a decision tree that computers can use to determine when to lie. I am trying below to represent one such tree at a very high level.

    The idea here is that once a data warehouse contains a list of all possible “lies” and their weight ages in order of benefit , damage and acceptability ( how likely the target is to believe the lie) any system can be thought how to lie. Why am I making such a big deal about it? Being able to lie is one of the key elements of passive self-preservation. Active self-preservation will come when we build humanoid robots that know Karate :-). And intelligence (artificial or not) is knowing; when to give the right answer, the wrong answer and most importantly when to Shut Up.

     

                                                                                                                                                    

Proof that MS SQL Server is the better relational database

I am probably going to start a war with this one but I need to set the record straight. Often in my trainings I have a mix of MS SQL Folks as well as Oracle folk. The training usually takes a detour on which is the better relational database platform. While I have been a MS SQL guys all through my career I have tried to be objective about the pros and cons of each system. However I have always relied on the below website to prove my point when it comes to databases.

TPC.org is a non-profit organization aimed at providing an industry standard to measure performance of hardware and software at the enterprise level. Hardware and software vendors constantly compete for the top position as a validation of their configuration and capability.

The below screen shot shows the top 10 configurations of the new TPC E benchmark which simulates a financial brokerage firm with people buying and selling stocks etc. Notice how the top 10 is dominated by MS SQL Server 2012 and 2014 on different hardware vendors. However I must call out that the better performance comes at a cost higher than Oracle on Linux systems which is kind of expected and not really a deterrent for a company looking to implement a High performance database system.

Game Theory and Databases

CASE 1

Two members of a criminal gang are arrested and imprisoned. Each prisoner is in solitary confinement with no means of speaking to or exchanging messages with the other. The police admit they don’t have enough evidence to convict the pair on the principal charge. They plan to sentence both to a year in prison on a lesser charge. Simultaneously, the police offer each prisoner a Faustian bargain. Each prisoner is given the opportunity either to betray the other, by testifying that the other committed the crime, or to cooperate with the other by remaining silent. Here’s how it goes:

  • If A and B both betray the other, each of them serves 2 years in prison
  • If A betrays B but B remains silent, A will be set free and B will serve 3 years in prison (and vice versa)
  • If A and B both remain silent, both of them will only serve 1 year in prison (on the lesser charge)

One of the most common examples of game theory is The Prisoner’s Dilemma.

With the recent announcement of 2014 Nobel laureates we have one more Economist who has contributed to Game theory. Game theory is the mathematical model used to identify behaviour patterns for elements participating in a scenario. It is commonly used to provide an answer to how a system will behave given a set of boundary conditions and rules that need to be followed by each participant usually to control a resource or achieve a desirable end result.

The use of Game theory in databases especially within BI is documented in Decision Tree Mining Model. I once read a book that implemented Game theory in such a way that at the end of each chapter the reader is given a choice, based on his choice he is redirected to different pages resulting in a different ending. Interestingly, regardless of the choice you make the author was able to narrow down the choices to result in one of 3 different ending. Therefore while simulating the idea of choice the game essentially didn’t provide any (the author could just as easily provide a unified ending to the story regardless of interim choices).

Game theory is used everywhere from bargaining to nuclear end game as well as market power analysis. In Retail a simple example could be as follows

CASE 2

In an online retail chain we have 2 scenarios

Scenario/ OutCome

Purchase nothing

Purchase for additional 100$

A shopper has zero store credits and is offered a 30$ discount on a purchase of an additional 100$ worth of products.

Highly Likely

Very unlikely

A shopper has 30$ store credits and is being asked to shop for an additional 100$ worth of products against which 30$ can be redeemed.

More Likely

Highly Likely

 

Here the only difference is the perception of the shopper in the first case they see it as losing a 100 bucks while in the second case they see it as using 30 Bucks.

The same thing applies to a number of different scenarios in business as well.

CASE 3

A company has a Project Manager position that needs to be filled. There are 10 candidates, they improve productivity fighting for the same position but only one of them win resulting in a 10X performance gain for the company and 0 percent promotion scope for the remaining 9.

Or the 10 of them decide not to improve performance resulting in a 0 percent performance gain for the company and the same 0 percent promotion scope of 9 out of 10, presuming the company will still fill the position with 1 out of the 10 candidates.

Human nature shows the first scenario is more common. Common sense tells us the second scenario is more beneficial.

BI has been used to validate actual data against predicted results for such end game theories. Dynamically assigning weights to outcomes to make them more or less desirable can be used to strongly skew to the result in a particular direction which is what we call bargaining.

It’s a fascinating topic and just as complicated.

 

 

 

 

 

Cell level encryption

Encrypting data at the cell level is important in order to protect information within the database. While all users have access to view data not all users are equal. Therefore it makes sense to encrypt data in such a way that only authorized users who have a key to decrypt the data are the only ones viewing it. Cell level encryption has a number of drawbacks especially when querying and searching for data however when there is a need to protect very sensitive information there really is no other way to do so.

--- CREATE DATABASE ENCRYPTION
--- MASTER KEY

CREATE MASTER KEY ENCRYPTION
BY PASSWORD = 'ENCRYPTDATABASE'

--- CREATE CERTIFICATE
CREATE CERTIFICATE MYCERTIFICATE
WITH SUBJECT = 'ENCRPTYDATABASECERTIFICATE'

--- CREATE SYMMETRIC KEY
CREATE SYMMETRIC KEY SYMMKEY
WITH ALGORITHM = TRIPLE_DES ENCRYPTION
BY CERTIFICATE MYCERTIFICATE

--- INSERT ENCRYPTED DATA

OPEN SYMMETRIC KEY SYMMKEY DECRYPTION
BY CERTIFICATE MYCERTIFICATE

INSERT INTO example.customer
SELECT 'Jayanth Kurup' ,ENCRYPTBYKEY(KEY_GUID('SYMMKEY'),'jayanth.kurup@*******.com')

--- CREATE FUNCTION TO CHECK CUSTOMER NAME EXISTANCE

OPEN SYMMETRIC KEY SYMMKEY DECRYPTION
BY CERTIFICATE MYCERTIFICATE

SELECT
CustomerName,
CONVERT(varchar(120), DecryptByKey(CustomerEmail))
AS 'Decrypted ID Number'
FROM example.customer;

 


    select

*

from

customer

The difference in execution plans for cell encrypted and non-encrypted searches

sys.dm_exec_describe_first_result_set()

This DMV is probably one of the most useful DMVs for developers and Business intelligence users since it describes the data type and other columnar information of the result set. Unlike sp_help which describes a table this particular DMV or DMF actually can be used to dynamically determine the datatypes and other usage options for any result set. i.e it could be very helpful when using .Net to create SSIS package or SSRS reports dynamically. It is important to remember that this DMV describes the result set and therefore the output may change based on the actual results and not the underlying table definition.

 

The basic usage is as mentioned sys.dm_exec_describe_first_result_set(Query here’,parameters,0 or 1)

 

Click here for the MSDN article on the above DMV

 

 

 

 

 

sys.dm_db_log_space_usage

Another DMV that is available in SQL server 2012 is the sys.dm_db_log_space_usage DMV, this DMV is useful when trying to decide the current state of the Log file. And if there is any use trying to shrink the database. This DMV could also help the DBA understand if the log file is constantly running out of space and therefore needing to auto grow. The main use however would be to understand how much space can be reclaimed in the case the DBA decides to shrink database the database.

 

However since this information on its own is not very useful it seems that the better approach would be to use data collectors or another tools if you’re really looking to understand the log file usage of the system.

Like most of the other DMVs release with this version of SQL the DMV itself is quite simple and unhelpful unless combined with some knowledge of internals and the other DMVs that needs to be used in addition to this one.

sys.dm_server_memory_dumps

Memory dumps are a copy of what lies in the RAM during a particular scenario such as a crash or even a manual Dump. Memory dumps are useful when working with any application because it helps identify exactly what was running on a system when the problem occurred. However with certain applications such as sql server the memory dump can be very huge. This is because there are mainly three types of dumps. These include the mini dump which is the default dump format in SQL server, the filtered dump which is the snapshot of what’s in the Ram with exception to the Buffer pool and then the Full Dump which is a complete Dump of everything that is in the RAM. It becomes clear that when taking FULL Dumps on system with say 64 GB RAM the single DUMP file can be about 60 GB which is a significant amount of space and can cause the system to run out of space. The above DMV helps capture the number of such dump files and the space occupied by them. This will help the BDA understand if there are any files that can be deleted or are obsolete or if he needs to change the way the Dumptrigger works so that it doesn’t generate many dumps files.

Most of the time the DBA won’t need to use this particular DMV unless he is already trouble shooting an issue.

 

The above screen shot shows the execution results for the DMV , for further details please visit the MSDN article here. For further details on how to create a dump file like here.

sys.dm_os_windows_info

A minor DMV with SQL Server this particular one doesn’t really provide much information about the OS other than the version and service pack level. It is nowhere near the amount of information that could be captured using other means. You can’t even say which edition of the OS its running so I really don’t see any DBA using this DMV other than to casually check the OS verison.

Click here for the msdn article on this DMV

select
*
from
sys.dm_os_windows_info

 

 

The only real useful information provided in this DMV could be the local info which most people are usually not sure about. If you find an interesting use for this DMV please post a comment with the way you’re using it.