I'm trying to further parse an output file I generated using an additional grep command. The code that I'm currently using is:
##!/bin/bash
# fetches the links of the movie's imdb pages for a given actor
# fullname="USER INPUT"
read -p "Enter fullname: " fullname
if [ "$fullname" = "Charlie Chaplin" ];
code="nm0000122"
then
code="nm0000050"
fi
curl "https://www.imdb.com/name/$code/#actor" | grep -Eo
'href="/title/[^"]*' | sed 's#^.*href=\"/#https://www.imdb.com/#g' |
sort -u > imdb_links.txt
#parses each of the link in the link text file and gets the details for
each of the movie. THis is followed by the cleaning process
for i in $(cat imdb_links.txt)
do
curl $i |
html2text |
sed -n '/Sign_In/,$p'|
sed -n '/YOUR RATING/q;p' |
head -n-1 |
tail -n 2
done > imdb_all.txt
The sample generated output is:
EN
⁰
* Fully supported
* English (United States)
* Partially_supported
* Français (Canada)
* Français (France)
* Deutsch (Deutschland)
* हिंदी (à¤à¤¾à¤°à¤¤)
* Italiano (Italia)
* Português (Brasil)
* Español (España)
* Español (México)
****** Duck Soup ******
* 19331933
* Not_RatedNot Rated
* 1h 9m
IMDb RATING
7.8/10
How do I change the code to further parse the output to get only the data from the title of the movie up until the imdb rating ( in this case, the line that contains the title 'Duck Soup' up until the end.
CodePudding user response:
Using sed
$ sed -n '/\*[^[:alpha:] ]*\*/,$ p' input_file
****** Duck Soup ******
* 19331933
* Not_RatedNot Rated
* 1h 9m
IMDb RATING
7.8/10
CodePudding user response:
Here is the code:
#!/bin/bash
# fullname="USER INPUT"
read -p "Enter fullname: " fullname
if [ "$fullname" = "Charlie Chaplin" ]; then
code="nm0000122"
else
code="nm0000050"
fi
rm -f imdb_links.txt
curl "https://www.imdb.com/name/$code/#actor" |
grep -Eo 'href="/title/[^"]*' |
sed 's#^href="#https://www.imdb.com#g' |
sort -u |
while read link; do
# uncomment the next line to save links into file:
#echo "$link" >>imdb_links.txt
curl "$link" |
html2text -utf8 |
sed -n '/Sign_In/,/YOUR RATING/ p' |
sed -n '$d; /^\*\{6\}.*\*\{6\}$/,$ p'
done >imdb_all.txt