I have very unfamiliar csv file where lines like this:
"31 lip 2021,""Inna opłata"",""--"",""--"",""--"",""--"",""--"",""--"",""--"",""--"",""-1,29"",""EUR"",""2 sie 2021"",""111"",""mBank *7981"",""Środki zostały wysłane"",""--"",""111"",""--"",--,""--"",""--"",""--"",""--"",""--"",""--"",""0%"",""--"",""--"",""--"",""--"",""--"",""-5,7"",""PLN"",""4,43151"",""FEE-111"",""Opłata za nazwę pomocniczą przedmiotu """
I've used GenericParserAdapter but result not happy: Result (ItemArray):
[0] "31 lip 2021" object {string}
[1] "\"Inna opłata\"" object {string}
[2] "\"--\"" object {string}
[3] "\"--\"" object {string}
[4] "\"--\"" object {string}
[5] "\"--\"" object {string}
[6] "\"--\"" object {string}
[7] "\"--\"" object {string}
[8] "\"--\"" object {string}
[9] "\"--\"" object {string}
[10] "\"\"-1" object {string}
[11] "29\"\"" object {string}
[12] "\"EUR\"" object {string}
[13] "\"2 sie 2021\"" object {string}
[14] "\"111\"" object {string}
[15] "\"mBank *7981\"" object {string}
[16] "\"Środki zostały wysłane\"" object {string}
[17] "\"--\"" object {string}
[18] "\"111\"" object {string}
[19] "\"--\"" object {string}
[20] "--" object {string}
[21] "\"--\"" object {string}
[22] "\"--\"" object {string}
[23] "\"--\"" object {string}
[24] "\"--\"" object {string}
[25] "\"--\"" object {string}
[26] "\"--\"" object {string}
[27] "\"0%\"" object {string}
[28] "\"--\"" object {string}
[29] "\"--\"" object {string}
[30] "\"--\"" object {string}
[31] "\"--\"" object {string}
[32] "\"--\"" object {string}
[33] "\"\"-5" object {string}
[34] "7\"\"" object {string}
[35] "\"PLN\"" object {string}
[36] "\"\"4" object {string}
[37] "43151\"\"" object {string}
[38] "\"FEE-111\"" object {string}
[39] "\"\"Opłata za nazwę pomocniczą przedmiotu " object {string}
Column 10 and 11 are split (36, 37 too) , but this is one value and cannot be split. How to properly configure parser (or split idea) and resolve this issue? Any solution?
CodePudding user response:
"31 lip 2021,""Inna opłata"",""--"",""--"",""--"",""--"",""--"",""--"",""--"",""--"",""-1,29"",""EUR"",""2 sie 2021"",""111"",""mBank *7981"",""Środki zostały wysłane"",""--"",""111"",""--"",--,""--"",""--"",""--"",""--"",""--"",""--"",""0%"",""--"",""--"",""--"",""--"",""--"",""-5,7"",""PLN"",""4,43151"",""FEE-111"",""Opłata za nazwę pomocniczą przedmiotu """
Somehow the full row is converted to a single field, and all double quotes are escaped with another double quote.
The row should look like this instead (which parses fine):
31 lip 2021,"Inna oplata","--","--","--","--","--","--","--","--","-1,29","EUR","2 sie 2021","111","mBank *7981","Srodki zostaly wyslane","--","111","--",--,"--","--","--","--","--","--","0%","--","--","--","--","--","-5,7","PLN","4,43151","FEE-111","Oplata za nazwe pomocnicza przedmiotu "
One solution might be to parse the data twice. First to convert to the original row, then to parse the data.
CodePudding user response:
Finally i resolve this problem like this:
var kodowanie = sciezkaPliku.GetEncoding();
var plik = new StringBuilder();
var linie = File.ReadAllLines(sciezkaPliku, kodowanie);
for (int i = 0; i < File.ReadAllLines(sciezkaPliku, kodowanie).Length; i )
{
plik.AppendLine(linie[i]
.Trim('\"')
.Replace(",\"\"", ";")
.Replace("\"\",", ";")
.Replace("\"\"", ""));
}
sciezkaPliku = $"{sciezkaPliku}_parsed";
if (File.Exists(sciezkaPliku))
{
File.Delete(sciezkaPliku);
}
File.AppendAllText(sciezkaPliku, plik.ToString(), kodowanie);
using (var parser = new GenericParserAdapter(sciezkaPliku, sciezkaPliku.GetEncoding()))
{
parser.FirstRowHasHeader = true;
parser.ColumnDelimiter = ';';
var pozycje = parser.GetDataTable();
foreach (var item in pozycje.Rows)
{
//ToDo
}
}