Home > Net >  Bash - remove specific textblock from file
Bash - remove specific textblock from file

Time:08-05

I want to remove a specific block of text from a file. I want to find the start of the text block to remove, and remove everything until a specific pattern is found.

Example string to search in:

\n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component and then follow many more characters with various special characters -- / ending with another \n---\n that I dont want to remove

I want to remove everything, starting from this string match \n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component

So basically, find pattern \n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component and remove everything until I match the next \n---\n

Expected output here would be:

\n---\n that I dont want to remove

Things I tried with sed:

sed 's/\n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component.*\n---\n//g'

Things I tried with grep:

echo $string  | grep -Ewo "\\\n---\\\n# Source: app/templates/deployment.yaml\\\n# template file\napiVersion: apps/v1\\\nkind: Deployment\nmetadata:\\\n name: component"

Nothing really works. Is there any bash wizard that can help?

CodePudding user response:

Using literal strings to avoid having to escape any characters and assuming your target string only exists once in the input:

$ cat tst.sh
#!/usr/bin/env bash

awk '
    BEGIN {
        begStr  = ARGV[1]
        endStr  = ARGV[2]
        ARGV[1] = ARGV[2] = ""
        begLgth = length(begStr)
    }
    begPos = index($0,begStr) {
        tail = substr($0,begPos begLgth)
        endPos = begPos   begLgth   index(tail,endStr) - 1
        print substr($0,1,begPos-1) substr($0,endPos)
    }
' \
    '\n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component' \
    '\n---\n' \
    "${@:--}"

$ ./tst.sh file
\n---\n that I dont want to remove

CodePudding user response:

You need to escape the backslashes in the regexp to match them literally.

If the part between \\n---\\n123456789 and \\n---\\n can't contain another -, you can use

sed 's/\\n---\\n123456789[^-]*\\n---\\n//g'

This assumption is needed because sed doesn't support non-greedy quantifiers, and .* will match until the last \\n---\\n, not the next one.

CodePudding user response:

So basically, find pattern \n---\n123456789 and remove everything until I match the next \n---\n

Using gnu-awk it might be simpler by making \n---\n a record separator (a non-regex approach):

s='aaa aaa\n---\n123456789 hha faewb\n---\naaaaaa\n---\n67891 0238\n---\nbbbf bb'
awk -v RS='\\\\n---\\\\n' '$1 != 123456789 {ORS=RT; print}' <<< "$s"

aaa aaa\n---\naaaaaa\n---\n67891 0238\n---\nbbbf bb

CodePudding user response:

With your shown samples please try following awk code. Searching string \\n---\\n# Source: app\/templates\/deployment.yaml\\n# template file\\napiVersion: apps\/v1\\nkind: Deployment\\nmetadata:\\n name: component and making field separator as \\\\n---\\\\n then printing last field of that line.

awk -v OFS="\\\\n---\\\\n " -F'\\\\n---\\\\n ' '
/\\n---\\n# Source: \
app\/templates\/deployment.yaml\\n# template \
file\\napiVersion: apps\/v1\\nkind: Deployment\
\\nmetadata:\\n name: component/{
  print OFS $NF
}
'  Input_file

Output will be as follows:

\n---\n that I dont want to remove
  • Related