GenericParserAdapter - parse ebay csv transaction-CodePudding

I have very unfamiliar csv file where lines like this:

"31 lip 2021,""Inna opłata"",""--"",""--"",""--"",""--"",""--"",""--"",""--"",""--"",""-1,29"",""EUR"",""2 sie 2021"",""111"",""mBank *7981"",""Środki zostały wysłane"",""--"",""111"",""--"",--,""--"",""--"",""--"",""--"",""--"",""--"",""0%"",""--"",""--"",""--"",""--"",""--"",""-5,7"",""PLN"",""4,43151"",""FEE-111"",""Opłata za nazwę pomocniczą przedmiotu """

I've used GenericParserAdapter but result not happy: Result (ItemArray):

        [0] "31 lip 2021"   object {string}
        [1] "\"Inna opłata\""   object {string}
        [2] "\"--\""    object {string}
        [3] "\"--\""    object {string}
        [4] "\"--\""    object {string}
        [5] "\"--\""    object {string}
        [6] "\"--\""    object {string}
        [7] "\"--\""    object {string}
        [8] "\"--\""    object {string}
        [9] "\"--\""    object {string}
        [10]    "\"\"-1"    object {string}
        [11]    "29\"\""    object {string}
        [12]    "\"EUR\""   object {string}
        [13]    "\"2 sie 2021\""    object {string}
        [14]    "\"111\""   object {string}
        [15]    "\"mBank *7981\""   object {string}
        [16]    "\"Środki zostały wysłane\""    object {string}
        [17]    "\"--\""    object {string}
        [18]    "\"111\""   object {string}
        [19]    "\"--\""    object {string}
        [20]    "--"    object {string}
        [21]    "\"--\""    object {string}
        [22]    "\"--\""    object {string}
        [23]    "\"--\""    object {string}
        [24]    "\"--\""    object {string}
        [25]    "\"--\""    object {string}
        [26]    "\"--\""    object {string}
        [27]    "\"0%\""    object {string}
        [28]    "\"--\""    object {string}
        [29]    "\"--\""    object {string}
        [30]    "\"--\""    object {string}
        [31]    "\"--\""    object {string}
        [32]    "\"--\""    object {string}
        [33]    "\"\"-5"    object {string}
        [34]    "7\"\"" object {string}
        [35]    "\"PLN\""   object {string}
        [36]    "\"\"4" object {string}
        [37]    "43151\"\"" object {string}
        [38]    "\"FEE-111\""   object {string}
        [39]    "\"\"Opłata za nazwę pomocniczą przedmiotu "    object {string}

Column 10 and 11 are split (36, 37 too) , but this is one value and cannot be split. How to properly configure parser (or split idea) and resolve this issue? Any solution?

CodePudding user response：

"31 lip 2021,""Inna opłata"",""--"",""--"",""--"",""--"",""--"",""--"",""--"",""--"",""-1,29"",""EUR"",""2 sie 2021"",""111"",""mBank *7981"",""Środki zostały wysłane"",""--"",""111"",""--"",--,""--"",""--"",""--"",""--"",""--"",""--"",""0%"",""--"",""--"",""--"",""--"",""--"",""-5,7"",""PLN"",""4,43151"",""FEE-111"",""Opłata za nazwę pomocniczą przedmiotu """

Somehow the full row is converted to a single field, and all double quotes are escaped with another double quote.

The row should look like this instead (which parses fine):

31 lip 2021,"Inna oplata","--","--","--","--","--","--","--","--","-1,29","EUR","2 sie 2021","111","mBank *7981","Srodki zostaly wyslane","--","111","--",--,"--","--","--","--","--","--","0%","--","--","--","--","--","-5,7","PLN","4,43151","FEE-111","Oplata za nazwe pomocnicza przedmiotu "

One solution might be to parse the data twice. First to convert to the original row, then to parse the data.

CodePudding user response：

Finally i resolve this problem like this:

 var kodowanie = sciezkaPliku.GetEncoding();
            var plik = new StringBuilder();
            var linie = File.ReadAllLines(sciezkaPliku, kodowanie);
            for (int i = 0; i < File.ReadAllLines(sciezkaPliku, kodowanie).Length; i  )
            {
                plik.AppendLine(linie[i]
                    .Trim('\"')
                    .Replace(",\"\"", ";")
                    .Replace("\"\",", ";")
                    .Replace("\"\"", ""));
            }
            sciezkaPliku = $"{sciezkaPliku}_parsed";
            if (File.Exists(sciezkaPliku))
            {
                File.Delete(sciezkaPliku);
            }
            File.AppendAllText(sciezkaPliku, plik.ToString(), kodowanie);
            using (var parser = new GenericParserAdapter(sciezkaPliku, sciezkaPliku.GetEncoding()))
            {
                parser.FirstRowHasHeader = true;
                parser.ColumnDelimiter = ';';
                var pozycje = parser.GetDataTable();

                foreach (var item in pozycje.Rows)
                {
//ToDo
                }
            }