I have yet another list of subdomain. I want to remove any Wildcard
subdomain which include these special characters:
()!&$#* ?
Mostly, the data are prefixly random. Also, could be middle. Here's some sample of output data
(www.imgur.com
***************diet.blogspot.com
*-1.gbc.criteo.com
------------------------------------------------------------i.imgur.com
This has been quite an inconvenience while scanning through the list. As always, I'm trying sed to fix it:
sed -i "/[!()#$&? ]/d" foo.txt ###Didn't work
sed -i "/[\!\(\)\#\$\&\?\ ]/d" ###Escaping char didn't work
Performing commands above still result in an unchanged
list and the file still on original state. I'm thinking that; to fix this is to pipe series of sed
command in order to remove it one by one:
cat foo.txt | sed -e "/!/d" -e "/#/d" -e "/\*/d" -e "/\$/d" -e "/(/d" -e "/)/d" -e "/ /d" -e "/\'/d" -e "/&/d" >> foo2.txt
cat foo.txt | sed -e "/\!/d" | sed -e "/\#/d" | sed -e "/\*/d" | sed -e "/\$/d" | sed -e "/\ /d" | sed -e "/\'/d" | sed -e "/\&/d" >> foo2.txt
If escaping all special char doesn't work, it must've been my false logic. Also tried with /g
still doesn't increase my luck.
As a side note: I don't want -
to be deleted as some valid subdomain can have -
character:
line-apps.com
line-apps-beta.com
line-apps-rc.com
line-apps-dev.com
Any help would be cherished.
CodePudding user response:
Using sed
$ sed '/[[:punct:]]/d' input_file
This should delete all lines with special characters, however, it would help if you provided sample data.
CodePudding user response:
End-up using single-quotation ''
mentioned by @potong
sed '/[\!\?\ \,\#\$\&\*\(\)\[\]\ ]/d'
No idea why it does that but shell is always the target to blame.
CodePudding user response:
To do what you're trying to do in your answer (which adds [
and ]
and more to the set of characters in your question) would be:
sed '/[][!? ,#$&*() ]/d'
or just:
grep -v '[][!? ,#$&*() ]'
Per POSIX to include ]
in a bracket expression it must be the first character otherwise it indicates the end of the bracket expression.
Consider printing lines you want instead of deleting lines you do not want, though, e.g.:
grep '^[[:alnum:]_.-]$' file
to print lines that only contain letters, numbers, underscores, dashes, and/or periods.