Bash - remove specific textblock from file-CodePudding

I want to remove a specific block of text from a file. I want to find the start of the text block to remove, and remove everything until a specific pattern is found.

Example string to search in:

\n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component and then follow many more characters with various special characters -- / ending with another \n---\n that I dont want to remove

I want to remove everything, starting from this string match \n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component

So basically, find pattern \n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component and remove everything until I match the next \n---\n

Expected output here would be:

\n---\n that I dont want to remove

Things I tried with sed:

sed 's/\n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component.*\n---\n//g'

Things I tried with grep:

echo $string  | grep -Ewo "\\\n---\\\n# Source: app/templates/deployment.yaml\\\n# template file\napiVersion: apps/v1\\\nkind: Deployment\nmetadata:\\\n name: component"

Nothing really works. Is there any bash wizard that can help?

CodePudding user response：

Using literal strings to avoid having to escape any characters and assuming your target string only exists once in the input:

$ cat tst.sh
#!/usr/bin/env bash

awk '
    BEGIN {
        begStr  = ARGV[1]
        endStr  = ARGV[2]
        ARGV[1] = ARGV[2] = ""
        begLgth = length(begStr)
    }
    begPos = index($0,begStr) {
        tail = substr($0,begPos begLgth)
        endPos = begPos   begLgth   index(tail,endStr) - 1
        print substr($0,1,begPos-1) substr($0,endPos)
    }
' \
    '\n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component' \
    '\n---\n' \
    "${@:--}"

$ ./tst.sh file
\n---\n that I dont want to remove

CodePudding user response：

You need to escape the backslashes in the regexp to match them literally.

If the part between \\n---\\n123456789 and \\n---\\n can't contain another -, you can use

sed 's/\\n---\\n123456789[^-]*\\n---\\n//g'

This assumption is needed because sed doesn't support non-greedy quantifiers, and .* will match until the last \\n---\\n, not the next one.

CodePudding user response：

So basically, find pattern \n---\n123456789 and remove everything until I match the next \n---\n

Using gnu-awk it might be simpler by making \n---\n a record separator (a non-regex approach):

s='aaa aaa\n---\n123456789 hha faewb\n---\naaaaaa\n---\n67891 0238\n---\nbbbf bb'
awk -v RS='\\\\n---\\\\n' '$1 != 123456789 {ORS=RT; print}' <<< "$s"

aaa aaa\n---\naaaaaa\n---\n67891 0238\n---\nbbbf bb

CodePudding user response：

With your shown samples please try following awk code. Searching string \\n---\\n# Source: app\/templates\/deployment.yaml\\n# template file\\napiVersion: apps\/v1\\nkind: Deployment\\nmetadata:\\n name: component and making field separator as \\\\n---\\\\n then printing last field of that line.

awk -v OFS="\\\\n---\\\\n " -F'\\\\n---\\\\n ' '
/\\n---\\n# Source: \
app\/templates\/deployment.yaml\\n# template \
file\\napiVersion: apps\/v1\\nkind: Deployment\
\\nmetadata:\\n name: component/{
  print OFS $NF
}
'  Input_file

Output will be as follows:

\n---\n that I dont want to remove