Text file
https://www.google.com/1/
https://www.google.com/2/
https://www.google.com
https://www.bing.com
https://www.bing.com/2/
https://www.bing.com/3/
Expected Output:
https://www.google.com/1/
https://www.bing.com
What I Tried
awk -F'/' '!a[$3] ' $file;
Output
https://www.google.com/1/
https://www.google.com
https://www.bing.com
https://www.bing.com/2/
I already tried various codes and none of them work as expected. I just want to pick only one unique domain URL per domain from the list.
Please tell me how I can do it by using the Bash script or Python.
PS: I want to filter and save full URLs from the list and not only the root domain.
CodePudding user response:
With awk
and /
as field separator:
awk -F '/' '!seen[$3] ' file
If your file contains Windows line breaks (carriage returns) then I suggest:
dos2unix < file | awk -F '/' '!seen[$3] '
Output:
https://www.google.com/1/ https://www.bing.com