Home > Mobile >  “sed” replace ambiguous pattern along with argument
“sed” replace ambiguous pattern along with argument

Time:05-13

I've got bunch of logs in which, on regular basis, I need to replace sensitive information like passwords and hostnames. Let’s say all occurrences needs to be replaced with keyword REMOVED. I cannot delete them as there need to be a proof that data was there.

Using sed is a requirement.

Unfortunately I run into problems with few use cases:

Here target is “password” keyword along with argument (proper pass). Everything after supposed to stay untouched (do-not-delete). Multiple “password” combinations expected, like:

password secret123 do-not-delete
password: secret123 do-not-delete
password = secret123 do-not-delete
app_password=secret123 do-not-delete

Here got few examples for hostnames. Expecting “web-* and “chicagonode*”, and same as above - everything after hostname must stay:

web-one do-not-delete
web-two do-not-delete
chicagonode1 do-not-delete
chicagonode2 do-not-delete

I tried something like this, but it doesn't work:

sed “s/password. \[:alnum:\]/REMOVED/gi” logfile.txt 

Does someone have an idea how to solve this puzzle? It can be multiple sed commands, one liners are not necessary.

EDIT:

Thanks HatLess! Your command works but it also removes keywords which suppose not to be removed, ie: "one-six" from below example:

parallels@debian-gnu-linux-10:/media/psf/Home$ cat input 
one two three
four fife
six
password secret123 
password: secret123 do-not-delete
password = secret123 do-not-delete
app_password=secret123 do-not-delete
web-one do-not-delete
web-two do-not-delete
chicagonode1 do-not-delete
chicagonode2 do-not-delete
parallels@debian-gnu-linux-10:/media/psf/Home$ sed 's/\(password[^[:alpha:]]*\)\?[^ ]*\(.*\)/REMOVED\2/' input
REMOVED two three
REMOVED fife
REMOVED
REMOVED 
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete

We are trying to remove only "passwordsecret123", "web-,chicagonode*".

Sorry for confusion.

CodePudding user response:

Using sed

$ cat input_file
password secret123 do-not-delete
password: secret123 do-not-delete
password = secret123 do-not-delete
app_password=secret123 do-not-delete
web-one do-not-delete
web-two do-not-delete
chicagonode1 do-not-delete
chicagonode2 do-not-delete
$ sed 's/\(password[^[:alpha:]]*\)\?[^ ]*\(.*\)/REMOVED\2/' input_file
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
  • \(password[^[:alpha:]]*\)\? - Optional group, if it is in the pattern, it will match from password up to the next appearance of an alphabetic character. The ? makes it optional. Despite being in a group parenthesis, it will be excluded as as it is not called in the replacement with back reference \1

  • [^ ]* - If the pattern does not start with password, then start from here. This will match up to the next appearance of a space. As it is not in a parenthesis, it will be excluded.

  • \(.*\) - Match everything else. This is within the second parenthesis group and as such, can be retained and returned with back reference \2

  • /REMOVED\2/' - Replace everything excluded with REMOVED Return the second parenthesis with back reference \2

  • Related