I'm trying to parse a data set in javascript using regex to pick out certain sections. One particular section consists of an unknown number of string-number pairs (i.e. the number of pairs is unknown in advance--can be 1 to x) starting with a known string (e.g. "Amount") and ending with a known string (e.g. "End"). For example, here's one data set with 3 such pairs:
Donation
Amount
Red Cross
1,630.83
Humanity First
500.00
Global Health
100.00
End
I ran the dataset through the following commands to get rid of whitespaces:
body = body.replace(/\n/g,"");
body = body.replace(/\r/g,"");
This results in the following string:
DonationAmountRed Cross1,630.83Humanity First500.00Global Health100.00End
If the number entries is fixed and known, say 1 entry, I know the regex for parsing it:
const REGEX_DONATION_AND_AMOUNT = new RegExp("DonationAmount([a-zA-Z\\s] )(\\d \.\\d )End");
receiptData.donation = body.match(REGEX_DONATION_AND_AMOUNT)[1];
receiptData.amount = body.match(REGEX_DONATION_AND_AMOUNT)[2];
But when the number of entries is variable, how can I loop through and parse this to get an array of donation type and amount pairs? Thank you for any help you can provide!
CodePudding user response:
Do this in two steps.
First get everything between Amount
and End
.
Then use a second regexp to get each item and amount.
let body = `Donation
Amount
Red Cross
1,630.83
Humanity First
500.00
Global Health
100.00
End`;
let donationString = body.match(/Donation\nAmount\n(.*)End/s)[1];
let donationItems = [...donationString.matchAll(/(.*)\n(.*)\n/g)];
console.log(donationItems);
CodePudding user response:
This javascript procedure .split
s the input data into an array where each element is a line field of your starting list.
A regular loop iterates the array looking for 'Amount', if found the values in the elements following it are added to a results array in pairs. When a pair is added, the next element is checked for 'End'. If there is no 'End', the next pair of lines are added to the results array until 'End' is encountered (and the loop reenters on the next element to repeat the process). Any data between 'End' and the next 'Amount' is ignored.
(I've expanded your input with some repetition to show the effect of more than one block with different numbers of pairs in each)
const linesArray = `Donation
Amount
Red Cross
1,630.83
Humanity First
500.00
Global Health
100.00
End
Donation
Amount
Red Cross
1,630.83
Global Health
100.00
End`.split('\n');
dataPairs = [];
for (let i=0; i<linesArray.length; i ) {
if (linesArray[i]=='Amount') {
while (linesArray[i 1] != 'End') {
dataPairs.push([linesArray[ i], linesArray[ i]]);
} // wend;
} // end if Amount;
if (linesArray[i] == 'End') continue;
} // next i;
console.log(dataPairs);
As presented, data pairs are not segregated for different blocks and each pair is in its own two-element array. The code could be modified to add each pair as a two-property object is required. The loop structure would remain the same.