I want to remove a specific block of text from a file. I want to find the start of the text block to remove, and remove everything until a specific pattern is found.
Example string to search in:
\n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component and then follow many more characters with various special characters -- / ending with another \n---\n that I dont want to remove
I want to remove everything, starting from this string match \n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component
So basically, find pattern \n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component
and remove everything until I match the next \n---\n
Expected output here would be:
\n---\n that I dont want to remove
Things I tried with sed:
sed 's/\n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component.*\n---\n//g'
Things I tried with grep:
echo $string | grep -Ewo "\\\n---\\\n# Source: app/templates/deployment.yaml\\\n# template file\napiVersion: apps/v1\\\nkind: Deployment\nmetadata:\\\n name: component"
Nothing really works. Is there any bash wizard that can help?
CodePudding user response:
Using literal strings to avoid having to escape any characters and assuming your target string only exists once in the input:
$ cat tst.sh
#!/usr/bin/env bash
awk '
BEGIN {
begStr = ARGV[1]
endStr = ARGV[2]
ARGV[1] = ARGV[2] = ""
begLgth = length(begStr)
}
begPos = index($0,begStr) {
tail = substr($0,begPos begLgth)
endPos = begPos begLgth index(tail,endStr) - 1
print substr($0,1,begPos-1) substr($0,endPos)
}
' \
'\n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component' \
'\n---\n' \
"${@:--}"
$ ./tst.sh file
\n---\n that I dont want to remove
CodePudding user response:
You need to escape the backslashes in the regexp to match them literally.
If the part between \\n---\\n123456789
and \\n---\\n
can't contain another -
, you can use
sed 's/\\n---\\n123456789[^-]*\\n---\\n//g'
This assumption is needed because sed
doesn't support non-greedy quantifiers, and .*
will match until the last \\n---\\n
, not the next one.
CodePudding user response:
So basically, find pattern
\n---\n123456789
and remove everything until I match the next\n---\n
Using gnu-awk it might be simpler by making \n---\n
a record separator (a non-regex approach):
s='aaa aaa\n---\n123456789 hha faewb\n---\naaaaaa\n---\n67891 0238\n---\nbbbf bb'
awk -v RS='\\\\n---\\\\n' '$1 != 123456789 {ORS=RT; print}' <<< "$s"
aaa aaa\n---\naaaaaa\n---\n67891 0238\n---\nbbbf bb
CodePudding user response:
With your shown samples please try following awk
code. Searching string \\n---\\n# Source: app\/templates\/deployment.yaml\\n# template file\\napiVersion: apps\/v1\\nkind: Deployment\\nmetadata:\\n name: component
and making field separator as \\\\n---\\\\n
then printing last field of that line.
awk -v OFS="\\\\n---\\\\n " -F'\\\\n---\\\\n ' '
/\\n---\\n# Source: \
app\/templates\/deployment.yaml\\n# template \
file\\napiVersion: apps\/v1\\nkind: Deployment\
\\nmetadata:\\n name: component/{
print OFS $NF
}
' Input_file
Output will be as follows:
\n---\n that I dont want to remove