Home > Software design >  How do I get the first substring after a specific substring in a string?
How do I get the first substring after a specific substring in a string?

Time:11-16

I have multiple text files that I want to process and get the version number of the 'banana' package section, here one example:

Package: apple
Settings: scim
Architecture: amd32
Size: 2312312312

Package: banana
Architecture: xsl64
Version: 94.3223.2
Size: 23232

Package: orange
Architecture: bbl64
Version: 14.3223.2
Description: Something descrip
 more description to orange

Package: friday
SHA215: d3d223d3f2ddf2323d3
Person: XCXCS
Size: 2312312312

What I know:

  • Package: [name] is always first line in a section.
  • Not all sections have a Package: [name] line.
  • Package: banana section always has a Version: line.
  • Version: line order is different. (can be second, fifth, last line..)
  • Package: banana section order is different. It can be at the start, middle, end of the document.
  • Version: [number] is always different

I want to find the Version number in banana package section, so 94.3223.2 from the example. I do not want to find it by hardcoded loops line by line, but do it with a nice solution.

I have tried something like this, but unfortunately it doesn't work for every scenario:

firstOperation = textFile.split('Package: banana').pop();
secondOperation = firstOperation.split('\n');
finalString = secondOperation[1].split('Version: ').pop();

My logic would be:

  1. Find Package: banana line
  2. Check the first occurence of 'Version:' after finding package banana line, then extract the version number from this line.

CodePudding user response:

These kinds of text extraction are always pretty fragile, so let me know if this works for your real inputs... Anyways, if we split by empty lines (which are really just double line breaks, \n\n), and then split each "paragraph" by \n, we get chunks of lines we can work with.

Then we can just find the chunk that has the banana package, and then inside that chunk, we find the line that contains the version.

Finally, we slice off Version: to get the version text.

const text = `\
Package: apple
Settings: scim
Architecture: amd32
Size: 2312312312

Package: banana
Architecture: xsl64
Version: 94.3223.2
Size: 23232

Package: orange
Architecture: bbl64
Version: 14.3223.2
Description: Something descrip
 more description to orange

SHA215: d3d223d3f2ddf2323d3
Person: XCXCS
Size: 2312312312
`;

const chunks = text.split("\n\n").map((p) => p.split("\n"));

const version = chunks
    .find((info) =>
        info.some((line) => line === "Package: banana")
    )
    .find((line) =>
        line.startsWith("Version: ")
    )
    .slice("Version: ".length);
    
console.log(version);

CodePudding user response:

To make this slightly more extensible, you can convert it to an object:

function process(input) {
  let data = input.split("\n\n"); // split by double new line
  data = data.map(i => i.split("\n")); // split each pair
  data = data.map(i => i.reduce((obj, cur) => {
    const [key, val] = cur.split(": "); // get the key and value
    obj[key.toLowerCase()] = val; // lowercase the value to make it a nice object
    return obj;
  }, {}));
  return data;
}

const input = `Package: apple
Settings: scim
Architecture: amd32
Size: 2312312312

Package: banana
Architecture: xsl64
Version: 94.3223.2
Size: 23232

Package: orange
Architecture: bbl64
Version: 14.3223.2
Description: Something descrip
 more description to orange

Package: friday
SHA215: d3d223d3f2ddf2323d3
Person: XCXCS
Size: 2312312312`;

const data = process(input);
const { version } = data.find(({ package }) => package === "banana"); // query data
console.log("Banana version:", version);

  • Related