Home > Software design >  Using sed to replace tabs if input is not guaranteed to contain tabs?
Using sed to replace tabs if input is not guaranteed to contain tabs?

Time:09-14

I'm trying to extract a list of names from a website using sed, but I'm not sure how to go about replacing the tab characters separating them. This code:

curl -s "https://namnidag.se/?year=2022&month=9&day=12" | sed -nE -e "s#<div class='names'>([^<]*)</div>#\1#p" | html2text

gives me the names for September 12th, but they are separated by a tab character:

Åsa    Åslög

If I change the sed script to replace tabs with comma and space, like this:

curl -s "https://namnidag.se/?year=2022&month=9&day=12" | sed -nE -e "s#<div class='names'>([^<]*)</div>#\1#" -e 's/\t/, /p' | html2text

it works as expected:

Åsa, Åslög

However, if I try on a day that only has one name, such as September 13th:

curl -s "https://namnidag.se/?year=2022&month=9&day=13" | sed -nE -e "s#<div class='names'>([^<]*)</div>#\1#" -e 's/\t/, /p' | html2text

I get no output; the first sed script without the tab replacement works fine in this case though. What am I doing wrong here? I'm using GNU sed 4.8, if that helps.

Thanks!

CodePudding user response:

You need to remove the p

curl -s "https://namnidag.se/?year=2022&month=9&day=12" | sed -nE -e "s#<div class='names'>([^<]*)</div>#\1#p" | sed -e 's/\t/, /'

CodePudding user response:

curl -s "https://namnidag.se/?year=2022&month=9&day=12" > f1
cat > ed1 <<EOF
71W f2
q
EOF

ed -s f1 < ed1
cat f2 | tail -c  20 | head -c -6 > file
rm -v ./ed1
rm -v ./f2

This will give you the names, whether there are two of them or not; and if there are, you can just seperate them with cut.

  • Related