Remove mutiline string after equal sign-CodePudding

I am trying to get IAM policies from AWS and add it to a text file in the below specific format. I want to delete everything after policy= in the file before end of the "}" bracket.

This is the text file sample I have. But the original file can have multiple instances of example_policy in the same file.

"example1_policy" {
    name="example"
    policy=jsonencode(
        {
            Statement=[
                {
                    Action=[
                        s3:*
                    ]
                    Effect="Allow"
                },
            ]
            Version="2012-10-17"
        }
    )
}
"example2_policy" {
    name="example2"
    policy=jsonencode(
        {
            Statement=[
                {
                    Action=[
                        s3:*
                    ]
                    Effect="Allow"
                },
            ]
            Version="2012-10-17"
        }
    )
}
"example3_policy" {
    name="example3"
    policy=jsonencode(
        {
            Statement=[
                {
                    Action=[
                        s3:*
                    ]
                    Effect="Allow"
                },
            ]
            Version="2012-10-17"
        }
    )
}

Expected Output:

"example1_policy" {
    name="example1"
    policy=
}
"example2_policy" {
    name="example2"
    policy=
}
"example3_policy" {
    name="example3"
    policy=
}

"example1_policy" {
    name="example1"
    policy=<placeholder>
}
"example2_policy" {
    name="example2"
    policy=<placeholder>
}
"example3_policy" {
    name="example3"
    policy=<placeholder>
}

As per @Wiktor's comment I tried out this command

sed -i '/policy=/,/^\s*)\s*$/d' test.txt

Output: policy= should remain intact.

"example_policy" {
    name="example"
}

CodePudding user response：

You could do this quite easily in Python since you have state:

def clean(s, pattern='policy='):
  pre, post = s.split(pattern)
  harmony = True
  quote = False
  braces = 0
  for i, char in enumerate(post):
    if harmony and char == '\n':
      return f'{pre}{pattern}{post[i:]}'
    if not quote:
      if char == '(':
        braces  = 1
      elif char == ')':
        braces -= 1
      elif char in ('"', "'"):
        quote = char
      harmony = braces == 0
    elif quote == char:
      quote = False

This would even ignore braces that are enclosed in strings (both " and ' strings).

So, this version works on trickier strings too:

"example_policy" {
    name="example"
    policy=jsonencode(
        {
            Statement=[
                {
                    Action=[
                        s3:*
                    ]
                    Effect="Al)l'ow"
                },
            ]
            Version='"2012-10)-17"'
        }
    )
}

You can easily extend this to support other types of braces or quotation. The only difference is that you need to use counters for braces since the opening and the closing characters are different, while for quotations you just need to add extra characters to the matching list - since everything else within the quotation is ignored, you just need to remember which character the quote was opened with.

Doing this with regexes would be tricker since they only support finite brace nesting.

To remove multiple policies within the same string we need to define a helper function:

def clean(s):
  harmony = True
  quote = False
  braces = 0
  for i, char in enumerate(s):
    if harmony and char == '\n':
      return s[i:]
    if not quote:
      if char == '(':
        braces  = 1
      elif char == ')':
        braces -= 1
      elif char in ('"', "'"):
        quote = char
      harmony = braces == 0
    elif quote == char:
      quote = False
  return s

And then the main function that will apply the "cleaning helper" to each individual chunk:

def clean_all(s, pattern='policy='):
  head, *tail = s.split(pattern)
  return f'{head}{pattern}{pattern.join(clean(part) for part in tail)}'

CodePudding user response：

You can use the following GNU sed command:

sed -i '/policy=/,/^\s*)\s*$/{/policy=/!d};s/\(policy=\).*/\1<placehlder>/' file

See the online demo. Details:

/policy=/,/^\s*)\s*$/ - finds blocks of lines between a line with policy= and a line that contains only a ) char enclosed with zero or more whitespaces
{/policy=/!d} - prevents the first line in the found block to be removed and removed the other line(s)
s/$policy=$.*/\1<placehlder>/ - replaces all after policy= with <placeholder>.