I've got bunch of logs in which, on regular basis, I need to replace sensitive information like passwords and hostnames. Let’s say all occurrences needs to be replaced with keyword REMOVED. I cannot delete them as there need to be a proof that data was there.
Using sed
is a requirement.
Unfortunately I run into problems with few use cases:
Here target is “password” keyword along with argument (proper pass). Everything after supposed to stay untouched (do-not-delete). Multiple “password” combinations expected, like:
password secret123 do-not-delete
password: secret123 do-not-delete
password = secret123 do-not-delete
app_password=secret123 do-not-delete
Here got few examples for hostnames. Expecting “web-* and “chicagonode*”, and same as above - everything after hostname must stay:
web-one do-not-delete
web-two do-not-delete
chicagonode1 do-not-delete
chicagonode2 do-not-delete
I tried something like this, but it doesn't work:
sed “s/password. \[:alnum:\]/REMOVED/gi” logfile.txt
Does someone have an idea how to solve this puzzle? It can be multiple sed commands, one liners are not necessary.
EDIT:
Thanks HatLess! Your command works but it also removes keywords which suppose not to be removed, ie: "one-six" from below example:
parallels@debian-gnu-linux-10:/media/psf/Home$ cat input
one two three
four fife
six
password secret123
password: secret123 do-not-delete
password = secret123 do-not-delete
app_password=secret123 do-not-delete
web-one do-not-delete
web-two do-not-delete
chicagonode1 do-not-delete
chicagonode2 do-not-delete
parallels@debian-gnu-linux-10:/media/psf/Home$ sed 's/\(password[^[:alpha:]]*\)\?[^ ]*\(.*\)/REMOVED\2/' input
REMOVED two three
REMOVED fife
REMOVED
REMOVED
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
We are trying to remove only "passwordsecret123", "web-,chicagonode*".
Sorry for confusion.
CodePudding user response:
Using sed
$ cat input_file
password secret123 do-not-delete
password: secret123 do-not-delete
password = secret123 do-not-delete
app_password=secret123 do-not-delete
web-one do-not-delete
web-two do-not-delete
chicagonode1 do-not-delete
chicagonode2 do-not-delete
$ sed 's/\(password[^[:alpha:]]*\)\?[^ ]*\(.*\)/REMOVED\2/' input_file
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
REMOVED do-not-delete
\(password[^[:alpha:]]*\)\?
- Optional group, if it is in the pattern, it will match from password up to the next appearance of an alphabetic character. The?
makes it optional. Despite being in a group parenthesis, it will be excluded as as it is not called in the replacement with back reference\1
[^ ]*
- If the pattern does not start with password, then start from here. This will match up to the next appearance of a space. As it is not in a parenthesis, it will be excluded.\(.*\)
- Match everything else. This is within the second parenthesis group and as such, can be retained and returned with back reference\2
/REMOVED\2/'
- Replace everything excluded withREMOVED
Return the second parenthesis with back reference\2