I want to parse information out of a string based on an input string and user defined token placemant (I define the possible tokens). For example:
Input Format: #artist# - #song# [#id#]
Input String: Taylor Swift - Anti-Hero [ID123]
I need to correctly pull out the artist, song, and ID.
My current approach is to replace everything outside of the token as a regex literal, then replace the tokens with the appropriate regex group.
let format = "#artist# - #song# [#id#]";
let split = format.split(/#\w*#/);
for(let i = 0; i< split.length; i ) {
if (split[i] !== '') {
let rep = split[i].split('').map(x => `\\${x}`).join('');
format = format.replace(split[i], rep);
}
}
format = format.replace('#artist#', '([\\w\\s]*)').replace('#song#', '([\\w\\s]*)').replaceAll('#ignore#', '([\\w\\s]*)');
let test = new RegExp(format, 'g');
let res = 'Taylor Swift - AntiHero [ID123]'.matchAll(test);
The above hard coded example works, but when using the correct song name "Anti-Hero", the hyphen breaks my regex match on ([\w\s]*)
. Also removing the brackets surrounding the ID breaks my results even though it could be a valid format. In this case, the brackets would just become part of the ID.
My approach as a whole seems incorrect as there can be issues in my format replacement (' '
will match the previous replaced ' - '
), and non word / space characters break the token search that I currently have. Is there a better way to do this?
CodePudding user response:
You might want to try this. The following function will take in an input format and string and output an object containing entries such as "song": "Anti-Hero"
and "id": "ID123"
.
function extract(format, string) {
// Get a list of the parts of the input format.
// The odd entries will be keys like "artist" or
// "song" and the even ones will be separators
// that we don't need to store.
let segments = format.split("#");
// Create an object to contain key-value pairs
// such as id: "ID123"
let output = {};
// Iterate through all of the segments in the format.
for (let i = 0; i < segments.length; i ) {
// If the current segment is empty, skip it.
if (segments[i].length == 0) continue;
if (i % 2 == 0) {
// If the current segment is a separator like " - " or "["
// skip past it and clip that part of the string.
if (string.startsWith(segments[i])) {
string = string.slice(segments[i].length);
} else {
throw new Error("String does not match format");
}
} else {
// Find the distance until the next seperator
let length = 0;
while (!(segments[i 1] && string.slice(length).startsWith(segments[i 1]))
&& length < string.length) {
length ;
}
// Store the key and value in the output object
// and clip off the beginning of the string.
output[segments[i]] = string.slice(0, length);
string = string.slice(length);
}
}
return output;
}
console.log(extract("#artist# - #song# [#id#]",
"Taylor Swift - Anti-Hero [ID123]"));
// Outputs { artist: "Taylor Swift", id: "ID123", song: "Anti-Hero" }
CodePudding user response:
In this example you can simply split on a longer string
let myString = "Taylor Swift - Anti-Hero [ID123]"
let myParts = myString.split(" - ")
// myParts now an array with "Taylor Swift" and "Anti-Hero [ID123]"
It will fail still if you have song names with " - " in the title but hopefully that is more rare! Hope this helps :)