I am trying to load a CSV file to my MYSQL database, however before I do so I need to modify it slightly. The CSV file is Pipe delimitated (|) I have a column in the CSV file called Party:Identification. This column has results such as "a:hello, b:hi c:151 ......" This can go on infinitely. I only need to get the value for c. I have come up with a method that works for this, however I am stuck on how to modify the value before the file is inserted into the database.
I tried replacing all the ":" in the headers with "" and then using .transform to modify the values, however this doesn't appear to change the values in the column, only the header. Code is attached below.
csv.parseFile(req.file.path, {
headers: headers => headers.map(function (header) {
const newHeaders = header.replaceAll(" ", "").replaceAll(":", "")
console.log(newHeaders)
return newHeaders
}),
delimiter: '|'
})
.transform(function(data) {
console.log(data)
PartyIdentification: getPartyID(data.partyIdentification)
})
.on("error", (err) => console.error(err))
.on("finish", function () {
query("LOAD DATA LOCAL INFILE '"
file
"' INTO TABLE table "
" FIELDS TERMINATED BY '|'"
" LINES TERMINATED BY '\n'"
" IGNORE 1 ROWS;").then(r =>
console.log(file)
)
})
function getPartyID(str) {
if (str === undefined) return ""
const split = str.split(",")
const value = split.find(val => {
return val.includes("c")
})
if(value === undefined) return ""
return (value.split(":")[1].trim())
}
CodePudding user response:
You can use a regex to parse the value of c:123
in a string:
function getPartyID(str) {
if (str === undefined) return "";
const m = str.match(/\bc:([^ ]*)/);
return m ? m[1] : null;
}
[
"a:hello, b:hi c:151 d:foo",
"a:hello, b:no_c",
].forEach(str => {
console.log(str, '==>', getPartyID(str));
});
Output:
a:hello, b:hi c:151 d:foo ==> 151
a:hello, b:no_c ==> null
Explanation of regex:
\b
-- word boundaryc:
-- literal text([^ ]*)
-- capture group 1 with value, up to and excluding space
CodePudding user response:
Don't bother fixing the csv file before loading, simply toss the unwanted columns as you LOAD
it.
This, for example, will load only the 3rd column:
LOAD DATA ...
(@a, @b, c_col, @d, @e, ...)
That is, capture the unwanted columns into @variables that you will then ignore.
If you need to remove the c:
before storing into the table, then
LOAD DATA ...
(@a, @b, @c, @d, @e, ...)
SET c_c0l = mid(@c, 3)
(or whatever expression will work. See also SUBSTRING_INDEX
in case it would work better.)
LOAD DATA
is plenty fast, even in this wasteful mode. And a lot less coding on your part.