I am trying to read data from an excel spreadsheet and put it in a csv file. I am using the ToCSV() extension(https://extensionmethod.net/csharp/list-string/datareader-to-csv). To remove any newline characters I added the removal of "\n" to get rid of all extra newline characters but this only fixed the issue for 90% of the data. Is there anything else I should be removing from the original strings to make sure there are no extra rows?
public static List<string> ToCSV(this IDataReader dataReader, bool includeHeaderAsFirstRow, string separator)
{
List<string> csvRows = new List<string>();
StringBuilder sb = null;
if (includeHeaderAsFirstRow)
{
sb = new StringBuilder();
for (int index = 0; index < dataReader.FieldCount; index )
{
if (dataReader.GetName(index) != null)
sb.Append(dataReader.GetName(index));
if (index < dataReader.FieldCount - 1)
sb.Append(separator);
}
csvRows.Add(sb.ToString());
}
while (dataReader.Read())
{
sb = new StringBuilder();
for (int index = 0; index < dataReader.FieldCount - 1; index )
{
if (!dataReader.IsDBNull(index))
{
string value = dataReader.GetValue(index).ToString();
if (dataReader.GetFieldType(index) == typeof(String))
{
//if newline character is used in value, ensure each are replaced.
if (value.IndexOf("\n") >= 0)
value = value.Replace("\n", "");
//If double quotes are used in value, ensure each are replaced but 2.
if (value.IndexOf("\"") >= 0)
value = value.Replace("\"", "\"\"");
//If separtor are is in value, ensure it is put in double quotes.
if (value.IndexOf(separator) >= 0)
value = "\"" value "\"";
if (value.IndexOf("CR LF") >= 0)
value = value.Replace("CR LF", "");
if (value.IndexOf("LF") >= 0)
value = value.Replace("LF", "");
}
sb.Append(value);
}
if (index < dataReader.FieldCount - 1)
sb.Append(separator);
}
if (!dataReader.IsDBNull(dataReader.FieldCount - 1))
sb.Append(dataReader.GetValue(dataReader.FieldCount - 1).ToString().Replace(separator, " "));
csvRows.Add(sb.ToString());
}
dataReader.Close();
sb = null;
return csvRows;
}
}
CodePudding user response:
replacing \r and \n with empty string would take care of eliminating newlines. However, I see you have replaced "CR LF" and "LF" with empty strings. This would actually replace any text with "LF" or "CR LF" in it and not the special characters (Carriage Return and Line Feed).
CodePudding user response:
As LearnerMantis said, you are replacing actual "CR" and "LF" strings and not the special characters.
If your program is reading files coming from the same environment, you can use the System.Environment.NewLine
constant.
The inner code of your loop would become:
string value = dataReader.GetValue(index).ToString();
if (dataReader.GetFieldType(index) == typeof(String))
{
value = value.Replace(System.Environment.NewLine, "");
}
sb.Append(value);
If you want to manage files from different environment, then you can use the Replace
function directly like this:
string replaceWith = "";
string value = dataReader.GetValue(index).ToString();
if (dataReader.GetFieldType(index) == typeof(String))
{
value = value.Replace("\r\n", replaceWith).Replace("\n", replaceWith).Replace("\r", replaceWith);
}
sb.Append(value);
You could also use Regex.Replace
if you prefer:
value = Regex.Replace(value, @"\r\n?|\n", "");