Adding CSV files as linked Server Connection

By | January 2, 2019

I often get a lot of migration projects where the data needs to port from CSV format into SQL Server. Here is a quick way I achieve it using Linked Server Connection in SQL Server. Why use Linked Server? Because it is very simple and easy to do. I could do the same using say and SSIS package but that would be too much effort some something as simple as a raw import. SSIS packages are useful when you want to do the same thing over and over.

The basic process here is the treat the folder which contains the files like it were a database and the individual files as table. Since we already have some technologies that have this format of storing data we can simply leverage their OLEBD driver to treat the data source in this way. Which is why we start off with installing Access Database OLEBD providers.

Download and install Access database OLEBD Providers from here.

Make sure to install the version that corresponds with your CPU architecture e.g. 32 vs 64 bit. Also you don’t need worry that it says Access Engine 2010 this will work on higher versions on SQL server too. Three cheers for backward compatibility.

Once Installed Run the below script to enabled SQL Server to access the provider which in turn can then access the files like as if they were tables.

 

SP_CONFIGURE 'show advanced options', 1;
GO
RECONFIGURE;
GO
SP_CONFIGURE 'Ad Hoc Distributed Queries', 1;
GO

 

AS you can see above, we need to install Ad Hoc Distributed queries. This is mainly so that we can query the metadata of the tables (CSV Files) from with SSMS. Without the above step you will encounter and error about access denied or could not establish a connection, depending on which version of SQL you’re querying from.

RECONFIGURE;

EXEC sp_MSset_oledb_prop N’Microsoft.ACE.OLEDB.12.0′,N’AllowInProcess’, 1

The above line of code is important so that the execution context of the query doesn’t change from that of the user who ran the query i.e in runs in the same context as the process calling it. With this we are done setting up the bare minimum need to do the job.

Next we need to setup the linked server, mostly because I already have a number of file and can’t be bothered to write openrowset queries for each file.

Once the above script has been executed you can create a linked server connection to the folder which has the csv files and SQL will in turn list the CSV files within the linked server connection as if they were tables.

EXEC master.dbo.SP_ADDLINKEDSERVER @server =N'CSV',@srvproduct=N'CSVFILES',@provider=N'Microsoft.ACE.OLEDB.12.0', @datasrc=N'G:\Data', @provstr=N'text'

GO

After this you can query the files just like a normal table such as select * from CSV…tablename

The logical next step would be how to import the tables into SQL Server so here is a simple easy to edit script that you can use to generate the tables and load the data as needed.

-- Creates a Temporary table to hold a list of table names
 
CREATE TABLE #tables
(tablename VARCHAR(100), 
newTableName AS REPLACE(tablename, '.','#') , 
DEPTH INT, 
[FILE] INT
)
-- INSERT file names and formats them into usables names for scripting out
INSERT INTO #tables ( tablename , DEPTH , [FILE])
EXEC Xp_dirtree 'G:\Data\Data' , 0,1
-- Generate the ETL script for loading data into a regular table.
SELECT  'select * into ['+REPLACE(tablename , SUBSTRING(tablename,CHARINDEX('_',tablename),100),'')+'] from [CSV]...'+newTableName
FROM #tables
 
IF exists (SELECT * FROM tempdb.INFORMATION_SCHEMA.tables WHERE TABLE_NAME like '%tables%')
DROP TABLE #tables