Friday, May 25, 2007

Validating an XML File using XSD in .NET .20

Below given is the.NET 1.1 code for validating an XML File using an XSD File.

.NET 1.1 Code

//.NET1.1 code
///
/// Methode to validate XML File
///

/// This method expects input XML as string
/// Path to schema file
/// true if xml is validated else false
private bool ValidateXmlUsingXsd(string XmlData,String SchemaPath)
{
XmlValidatingReader v = new XmlValidatingReader(XmlData, XmlNodeType.Document, null);
v.ValidationType = ValidationType.Schema;
v.ValidationEventHandler +=
new ValidationEventHandler(MyValidationEventHandler);

while (v.Read())
{
// Can add code here to process the content.
}
v.Close();

return isValid;

}
///
/// This event handler is called only when a validation error occurs
///

///
///
public static void MyValidationEventHandler(object sender,
ValidationEventArgs args)
{
//these two variables should be initialized as class level variables
isValid = false;
errorMessage = "Validation event\n" + args.Message;

}
///
/// Method to get XML in a string from an XML file
///

///
///
private string GetStringFromXML(string fileName)
{
StreamReader rd = new StreamReader(fileName);
string str = rd.ReadToEnd();
rd.Close();
return str;
}




Calling the method :
bool valid = ValidateXmlUsingXsd(str, txtXSD.Text);


There are some changes in the .NET 2.0 code for XML Validation .
XmlValidatingReader is marked as obsolete. Need to use XMLReader.Create() using XmlReaderSettngs instead

There are some behavioral changes between validation using the XmlReaderSettings and XmlSchemaSet classes and validation using the XmlValidatingReader class.

The XmlReaderSettings and XmlSchemaSet classes do not support XML-Data Reduced (XDR) schema validation.

The most important difference I found out is that to do XML data validation using a schema, settings.ValidationFlags = XmlSchemaValidationFlags.ReportValidationWarnings;
Flag must be enabled. Otherwise the Schema check error will not be displayed.

//.NET2.0 code
private bool ValidateXmlUsingXsd2(string XmlData,String SchemaPath)
{
XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.Schema;
settings.Schemas.Add(null, SchemaPath);
settings.ValidationFlags = XmlSchemaValidationFlags.ReportValidationWarnings;
settings.ValidationEventHandler += new ValidationEventHandler(ValidationCallBack);

StringReader xmlStream = new StringReader(XmlData);
XmlReader reader = XmlReader.Create(xmlStream, settings);
while (reader.Read()) ;

return isValid;
}

private static void ValidationCallBack(object sender, ValidationEventArgs e)
{
isValid = false;
errorMessage = "Validation Error: " + e.Message;
}


More about XmlSchemaValidationFlags Enumeration


Member name
Description
AllowXmlAttributes
Allow xml:* attributes even if they are not defined in the schema. The attributes will be validated based on their data type.
None
Do not process identity constraints, inline schemas, schema location hints, or report schema validation warnings.
ProcessIdentityConstraints
Process identity constraints (xs:ID, xs:IDREF, xs:key, xs:keyref, xs:unique) encountered during validation.
ProcessInlineSchema
Process inline schemas encountered during validation.
ProcessSchemaLocation
Process schema location hints (xsi:schemaLocation, xsi:noNamespaceSchemaLocation) encountered during validation.
ReportValidationWarnings
Report schema validation warnings encountered during validation.


In a nutshell, Always set settings.ValidationFlags = XmlSchemaValidationFlags.ReportValidationWarnings; when a schema validation is required for the XML in .NET 2.0

Saturday, May 19, 2007

Truncation of fields when CSV file is read using ADO.NET

I encountered a major problem with ADO.NET CSV reader in one of the projects where CSV File Import played a major role.
When a field in the CSV file with a “–“(hyphen) is read, the characters before the “–“are discarded. For example the model F-150 is read as –150 and T-Bird in the model field is not being read. And some of the values were missing in some of the fields.
Given below is the code I used :

string strConnString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + folderName + ";Extended Properties=\"text;HDR=Yes; FMT=Delimited\"";
string sqlSelect = "select * from [" + fileName + "]";
System.Data.OleDb.OleDbConnection conn = new System.Data.OleDb.OleDbConnection(
strConnString.Trim());
conn.Open();
System.Data.OleDb.OleDbDataAdapter adapter = new System.Data.OleDb.OleDbDataAdapter(sqlSelect, conn);
DataSet ds = new DataSet();
adapter.Fill(ds, "Inventory");
return ds.Tables[0];


Samir found a solution, to add a schema file which identifies all fields in the csv file as string values.
And call DataSet.ReadXMLSchema() method to attach the schema to the DataSet. Also the schema constraints are not enforced .

modified code looks like this..


string strConnString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + folderName + ";Extended Properties=\"text;HDR=Yes; FMT=Delimited; IMEX=1\"";
string sqlSelect = "select * from [" + fileName + "]";
System.Data.OleDb.OleDbConnection conn = new System.Data.OleDb.OleDbConnection(strConnString.Trim());
conn.Open();
System.Data.OleDb.OleDbDataAdapter adapter = new System.Data.OleDb.OleDbDataAdapter(sqlSelect, conn);
DataSet ds = new DataSet();

ds.ReadXmlSchema(Server.MapPath("xmlschema.xsd"));
ds.EnforceConstraints = false;


adapter.Fill(ds, "Inventory");
return ds.Tables[0];


Even after using this code values like "F-150" was read as -150 if there is only one row in the csv file.

F-100 reads as -100
K-100 reads as -100
S-100 reads as -100

100-F reads as -100
100-K reads as -100
100-S reads as -100
100$F reads as 100
100$K reads as 100
100$S reads as 100
F100 reads as 100
K100 reads as 100
S100 reads as 100
F\100 reads as 100
K\100 reads as 100
S\100 reads as 100
F.100 reads as .1
K.100 reads as .1
S.100 reads as .1
K-.\\$$$.\\1 reads as -0.1
K\\-22..$$\\21 reads as -22.221
-$\.FSK1 reads as -0.1

This happens only wen there is only one row in the csv file or more than half the values in a column is having the avove specified values

May be because ADO.NET does some internal calculation to treat F as floating point or something.. The fun part is that f-150 is read correctly.. problem is with capital letters :) .
So we ended up using a custom third party csv reader .
Moral of the story... never use ADO.NET csv reader... Always go for a custom CSV parser or a third party library.

Monday, May 14, 2007

Method to sort an array of strings in descending order of number of words in each array element

Just adding a method which i wrote for an application for which the requirement was scrapped.
Hope someone can refer to this silly method. ;)

///
/// Method to sort an array of strings in descending order of number of words in each array element
///

/// Array to be sorted
/// Array sorted in descending order of number of words in each array element

private static string[] SortArrayWithDescendingWordCount(string[] strArray)
{
//Array to store the number of words in each string of the array to be sorted
int[] wordLengths = new int[strArray.Length];
//variable to keep track of array index of wordLengths array.
int arrayIndex = 0;
foreach (string str in strArray)
{
//split the string in to an array of words and store the word count in wordLengths array.
wordLengths[arrayIndex] = str.Split(' ').Length; ;
arrayIndex++;
}
//Sort arrays(ascending order) by taking wordLenths array as key and strArray as value
Array.Sort(wordLengths, strArray);
//now reverse strArray array to sort the array in descending order of number of words in each array element
Array.Reverse(strArray);
return strArray;
}