ADO Datareader Bug? Datareader creates additional(unexpected) cells when reading from Excel Spreadsheet  
Author Message
Lurcher





PostPosted: .NET Framework Data Access and Storage, ADO Datareader Bug? Datareader creates additional(unexpected) cells when reading from Excel Spreadsheet Top

I wanted to be able to interrogate an Excel spreadsheet and programmatically save all the worksheets as text files (.csv).
I succesfully managed to do this using the .NET COM interop Excel collection of interfaces but even though this works beautifully I thought it would be nice if I could achieve the same results using the ADO.net classes.

For the purpose of this query please regard an individual Exel worksheet as a 'table'.
Having coded this I find that the Jet4.0 Excel Database engine in conjunction with a DataReader does not interpret the 'table' in the same way as the COM interop interfaces. The problem being that where a cell contains a "," (comma) the ADO datareader interprets that cell as two cells.

Eg "16, The Ridings" is interpretted as two cells even though when viewed in Excel it is in one cell. The interop Excel.Worksheet.SaveAs(...) works fine and interprets the above as a single cell but the DataReader does not. There does not seem to be any propery I can set on the DataReader to alter its interpretation of the delimiter and there also does not seem to be any property on the connection or command objects that I can set to alter the DataReader's behaviour. I am unable to figure out how to override this undesirable behaviour and wondered if anyone has come across this problem Is this a bug

Any help would be appreciated - Here is the connection string and the code:

Connection string:

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\TestFiles\TestFile.xls;Extended Properties="Excel 8.0;HDR=Yes;IMEX=1"

private void DisplaySomeRows()

{

if (openFileDialog1.ShowDialog() == DialogResult.OK)

{

_fileNameandPath = openFileDialog1.FileName;

GetWorkSheetNames(_fileNameandPath);

}

string conn = GetConnectionString(_fileNameandPath);

lblConnectionstring.Text = conn;

DbProviderFactory factory = DbProviderFactories.GetFactory("System.Data.OleDb");

using (DbConnection connection = factory.CreateConnection())

{

connection.ConnectionString = conn;

using (DbCommand command = connection.CreateCommand())

{

//Create a directory based on the Spreadsheet name to store the csv versions of the worksheets

string newDirectory = _fileNameandPath.Remove(_fileNameandPath.LastIndexOf("."));

if (!Directory.Exists(newDirectory))

Directory.CreateDirectory(newDirectory);

foreach (string workSheetName in _workSheets)

{

//Create a csv file for this worksheet

StreamWriter sw = File.CreateText(newDirectory + "\\" + workSheetName + ".csv");

// Worksheets are referenced by their worksheet names -We require all rows from the worksheet

command.CommandText = "SELECT * FROM [" + workSheetName + "$]";

if (connection.State != ConnectionState.Open)

connection.Open();

StringBuilder dataLine = new StringBuilder();

//Populate the DataReader with data from the worksheet

using (DbDataReader dr = command.ExecuteReader())

{

while (dr.Read())

{

//Reset the string builder

dataLine.Remove(0, dataLine.Length);

for (int fieldCount = 0; fieldCount < dr.FieldCount; fieldCount++)

{

dataLine.Append(dr[fieldCount]);

if (fieldCount != dr.FieldCount - 1)

dataLine.Append(",");

}

sw.WriteLine(dataLine);

}

sw.Close();

}

}

}

}

}




.NET Development5  
 
 
Sarah Parra - MSFT





PostPosted: .NET Framework Data Access and Storage, ADO Datareader Bug? Datareader creates additional(unexpected) cells when reading from Excel Spreadsheet Top

I don't think the DataReader is the problem here. If you look at the data in the reader itself, you should see that your data with the comma is still a single field. In other words, if you print the data to the screen instead of to the file, you should see that the comma stays inside of the data and doesn't cause the field to split into two.

However, if you look at the logic you're using to create the dataLine value, that appears to be where the problem is.

If you have a row with Field1="sarah", Field2="1, fieldValue", and Field3="parra", your dataLine string is going to end up like this: "sarah, 1, fieldValue, parra". There is no distinction between a comma that's part of the data, and one that's the delimiter.

You need to surround your field values with quotes so Excel can tell the difference. Change this line:

dataLine.Append(dr[fieldCount]);

to this:

dataLine.Append("\"" + dr[fieldCount] + "\"");

Thanks,
Sarah

Please Mark as Answer if this answers your question, or Unmark as Answer if it is marked and you feel it is not answered.