Saturday, May 19, 2007

Truncation of fields when CSV file is read using ADO.NET

I encountered a major problem with ADO.NET CSV reader in one of the projects where CSV File Import played a major role.
When a field in the CSV file with a “–“(hyphen) is read, the characters before the “–“are discarded. For example the model F-150 is read as –150 and T-Bird in the model field is not being read. And some of the values were missing in some of the fields.
Given below is the code I used :

string strConnString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + folderName + ";Extended Properties=\"text;HDR=Yes; FMT=Delimited\"";
string sqlSelect = "select * from [" + fileName + "]";
System.Data.OleDb.OleDbConnection conn = new System.Data.OleDb.OleDbConnection(
strConnString.Trim());
conn.Open();
System.Data.OleDb.OleDbDataAdapter adapter = new System.Data.OleDb.OleDbDataAdapter(sqlSelect, conn);
DataSet ds = new DataSet();
adapter.Fill(ds, "Inventory");
return ds.Tables[0];


Samir found a solution, to add a schema file which identifies all fields in the csv file as string values.
And call DataSet.ReadXMLSchema() method to attach the schema to the DataSet. Also the schema constraints are not enforced .

modified code looks like this..


string strConnString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + folderName + ";Extended Properties=\"text;HDR=Yes; FMT=Delimited; IMEX=1\"";
string sqlSelect = "select * from [" + fileName + "]";
System.Data.OleDb.OleDbConnection conn = new System.Data.OleDb.OleDbConnection(strConnString.Trim());
conn.Open();
System.Data.OleDb.OleDbDataAdapter adapter = new System.Data.OleDb.OleDbDataAdapter(sqlSelect, conn);
DataSet ds = new DataSet();

ds.ReadXmlSchema(Server.MapPath("xmlschema.xsd"));
ds.EnforceConstraints = false;


adapter.Fill(ds, "Inventory");
return ds.Tables[0];


Even after using this code values like "F-150" was read as -150 if there is only one row in the csv file.

F-100 reads as -100
K-100 reads as -100
S-100 reads as -100

100-F reads as -100
100-K reads as -100
100-S reads as -100
100$F reads as 100
100$K reads as 100
100$S reads as 100
F100 reads as 100
K100 reads as 100
S100 reads as 100
F\100 reads as 100
K\100 reads as 100
S\100 reads as 100
F.100 reads as .1
K.100 reads as .1
S.100 reads as .1
K-.\\$$$.\\1 reads as -0.1
K\\-22..$$\\21 reads as -22.221
-$\.FSK1 reads as -0.1

This happens only wen there is only one row in the csv file or more than half the values in a column is having the avove specified values

May be because ADO.NET does some internal calculation to treat F as floating point or something.. The fun part is that f-150 is read correctly.. problem is with capital letters :) .
So we ended up using a custom third party csv reader .
Moral of the story... never use ADO.NET csv reader... Always go for a custom CSV parser or a third party library.

No comments: