Home > database >  Complex Regex (line breaks, multiple variables)
Complex Regex (line breaks, multiple variables)

Time:11-02

I have an issue where I need to identify this pattern:

"series":[{
"name":"Some variable thing",
"data":[,,,]}

There are line breaks that I'm concerned with, "Some variable thing" will be of arbitrary length, and the number of commas in the "data" brackets will be variable.

I could figure this out with a few hours of effort maybe, but maybe somebody here can give me a kick start.

Edit

Here is a truncated sample of the HTML file that is produced:

 <!DOCTYPE html>
 <html><body><div class="main-section"></div><script>var options = {
"chart":{
"id":"ChartID-qn5g8ay9",
"height":"350px",
"type":"bar",
"stacked":true},
"title":{
"text":"Fictional Books Sales"},
"legend":{
"show":true,
"position":"top"},
"plotOptions":{
"bar":{
"horizontal":true}},
"dataLabels":{
"enabled":true,
"offsetX":-6,
"style":{
"fontSize":"12px"}},
"series":[{
"name":"Tank Picture",
"data":[,,,,,,]},{
"name":"Bucket Slope",
"data":[53,32,33,52,13,43,32]}],
"xaxis":{
"categories":["2008","2009","2010","2011","2012","2013","2014"]}}
var chart = new ApexCharts(document.querySelector('#ChartID-qn5g8ay9'),
            options
        );
chart.render();

Note that there will be a variable number of charts contained in the file, but I am only concerned with replacing the first instance of "data":[,,,,,,]},{ that follows each instance of "series":[{ with "data":[0,,,,,,]},{, where the number of array members (commas) is variable.

CodePudding user response:

Given the mix of data formats in your input (HTML, JavaScript, JSON), which makes extraction and selective modification of the embedded JSON data a challenge, a regex solution is indeed probably simplest:

Use the regex-based -replace operator:

(Get-Content -Raw file.htm) -replace `
  '(?<="series"\s*:\s*\[\{\s*"name"\s*:\s*"[^"] ",\s*"data":\s*\[),',
  '0,'
  • Get-Content -Raw reads the input file as a whole, as a single, multi-line string, which enables matching across lines.

  • A positive lookbehind assertion ((?<=...)) is used to match the text preceding the , of interest; the latter is then replaced with 0,

  • For robustness, \s* is inserted to match varying amounts of whitespace, if any, in places where they have no syntactic meaning in the JSON string, to guard against incidental formatting variations.

To experiment with the regex interactively, see <body>

<script>var options = { "chart":{ "id":"ChartID-qn5g8ay9", "height":"350px", "type":"bar", "stacked":true}, "title":{ "text":"Fictional Books Sales"}, "legend":{ "show":true, "position":"top"}, "plotOptions":{ "bar":{ "horizontal":true}}, "dataLabels":{ "enabled":true, "offsetX":-6, "style":{ "fontSize":"12px"}}, "series":[{ "name":"Tank Picture", "data":[,,,,,,]},{ "name":"Bucket Slope", "data":[53,32,33,52,13,43,32]}], "xaxis":{ "categories":["2008","2009","2010","2011","2012","2013","2014"]}} var chart = new ApexCharts(document.querySelector('#ChartID-qn5g8ay9'), options ); chart.render();&r=0,&o=i" rel="nofollow noreferrer">this regexstorm.net page.

  • Related