Home > Software design >  Fast-CSV modify before loading CSV to MySQL
Fast-CSV modify before loading CSV to MySQL

Time:01-10

I am trying to load a CSV file to my MYSQL database, however before I do so I need to modify it slightly. The CSV file is Pipe delimitated (|) I have a column in the CSV file called Party:Identification. This column has results such as "a:hello, b:hi c:151 ......" This can go on infinitely. I only need to get the value for c. I have come up with a method that works for this, however I am stuck on how to modify the value before the file is inserted into the database.

I tried replacing all the ":" in the headers with "" and then using .transform to modify the values, however this doesn't appear to change the values in the column, only the header. Code is attached below.

 csv.parseFile(req.file.path, {
        headers: headers => headers.map(function (header) {
            const newHeaders = header.replaceAll(" ", "").replaceAll(":", "")
            console.log(newHeaders)
            return newHeaders
        }),
        delimiter: '|'
    })
        .transform(function(data) {
            console.log(data)
            PartyIdentification: getPartyID(data.partyIdentification)
        })
        .on("error", (err) => console.error(err))
        .on("finish", function () {
            query("LOAD DATA LOCAL INFILE '"  
                file  
                "' INTO TABLE table "  
                " FIELDS TERMINATED BY '|'"  
                " LINES TERMINATED BY  '\n'"  
                " IGNORE 1 ROWS;").then(r =>
                console.log(file)
            )
        })



function getPartyID(str) {
    if (str === undefined) return ""
    const split = str.split(",")
    const value = split.find(val => {
        return val.includes("c")
    })
    if(value === undefined) return ""
    return (value.split(":")[1].trim())
}

CodePudding user response:

You can use a regex to parse the value of c:123 in a string:

function getPartyID(str) {
  if (str === undefined) return "";
  const m = str.match(/\bc:([^ ]*)/); 
  return m ? m[1] : null;
}

[
  "a:hello, b:hi c:151 d:foo",
  "a:hello, b:no_c",
].forEach(str => {
  console.log(str, '==>', getPartyID(str));
});

Output:

a:hello, b:hi c:151 d:foo ==> 151
a:hello, b:no_c ==> null

Explanation of regex:

  • \b -- word boundary
  • c: -- literal text
  • ([^ ]*) -- capture group 1 with value, up to and excluding space

CodePudding user response:

Don't bother fixing the csv file before loading, simply toss the unwanted columns as you LOAD it.

This, for example, will load only the 3rd column:

LOAD DATA ...
    (@a, @b, c_col, @d, @e, ...)

That is, capture the unwanted columns into @variables that you will then ignore.

If you need to remove the c: before storing into the table, then

LOAD DATA ...
    (@a, @b, @c, @d, @e, ...)
    SET c_c0l = mid(@c, 3)

(or whatever expression will work. See also SUBSTRING_INDEX in case it would work better.)

LOAD DATA is plenty fast, even in this wasteful mode. And a lot less coding on your part.

  • Related