Home > Net >  How to parse strings based on user provided format tokens?
How to parse strings based on user provided format tokens?

Time:12-28

I want to parse information out of a string based on an input string and user defined token placemant (I define the possible tokens). For example:

Input Format: #artist# - #song# [#id#]

Input String: Taylor Swift - Anti-Hero [ID123]

I need to correctly pull out the artist, song, and ID.

My current approach is to replace everything outside of the token as a regex literal, then replace the tokens with the appropriate regex group.

let format = "#artist# - #song# [#id#]";
let split = format.split(/#\w*#/);

for(let i = 0; i< split.length; i  ) {
  if (split[i] !== '') {
    let rep = split[i].split('').map(x => `\\${x}`).join('');
    format = format.replace(split[i], rep);
  }
}
format = format.replace('#artist#', '([\\w\\s]*)').replace('#song#', '([\\w\\s]*)').replaceAll('#ignore#', '([\\w\\s]*)');
let test = new RegExp(format, 'g');
let res = 'Taylor Swift - AntiHero [ID123]'.matchAll(test);

The above hard coded example works, but when using the correct song name "Anti-Hero", the hyphen breaks my regex match on ([\w\s]*). Also removing the brackets surrounding the ID breaks my results even though it could be a valid format. In this case, the brackets would just become part of the ID.

My approach as a whole seems incorrect as there can be issues in my format replacement (' ' will match the previous replaced ' - '), and non word / space characters break the token search that I currently have. Is there a better way to do this?

CodePudding user response:

You might want to try this. The following function will take in an input format and string and output an object containing entries such as "song": "Anti-Hero" and "id": "ID123".

function extract(format, string) {
  // Get a list of the parts of the input format.
  // The odd entries will be keys like "artist" or
  // "song" and the even ones will be separators
  // that we don't need to store.
  let segments = format.split("#");

  // Create an object to contain key-value pairs
  // such as id: "ID123"
  let output = {};

  // Iterate through all of the segments in the format.
  for (let i = 0; i < segments.length; i  ) {
    // If the current segment is empty, skip it.
    if (segments[i].length == 0) continue;

    if (i % 2 == 0) {
      // If the current segment is a separator like " - " or "["
      // skip past it and clip that part of the string.

      if (string.startsWith(segments[i])) {
        string = string.slice(segments[i].length);
      } else {
        throw new Error("String does not match format");
      }
    } else {
      // Find the distance until the next seperator
      let length = 0;
      while (!(segments[i   1] && string.slice(length).startsWith(segments[i   1]))
             && length < string.length) {
        length  ;
      }

      // Store the key and value in the output object
      // and clip off the beginning of the string.
      output[segments[i]] = string.slice(0, length);
      string = string.slice(length);
    }
  }

  return output;
}

console.log(extract("#artist# - #song# [#id#]",
                    "Taylor Swift - Anti-Hero [ID123]"));

// Outputs { artist: "Taylor Swift", id: "ID123", song: "Anti-Hero" }

CodePudding user response:

In this example you can simply split on a longer string

let myString = "Taylor Swift - Anti-Hero [ID123]"
let myParts = myString.split(" - ")
// myParts now an array with "Taylor Swift" and "Anti-Hero [ID123]"

It will fail still if you have song names with " - " in the title but hopefully that is more rare! Hope this helps :)

  • Related