Home > Mobile >  Bash sed command issue
Bash sed command issue

Time:04-12

I'm trying to further parse an output file I generated using an additional grep command. The code that I'm currently using is:

##!/bin/bash

# fetches the links of the movie's imdb pages for a given actor

# fullname="USER INPUT"
read -p "Enter fullname: " fullname

if [ "$fullname" = "Charlie Chaplin" ];
code="nm0000122"
then
code="nm0000050"
fi


curl "https://www.imdb.com/name/$code/#actor" | grep -Eo 
'href="/title/[^"]*' | sed 's#^.*href=\"/#https://www.imdb.com/#g' | 
sort -u > imdb_links.txt

#parses each of the link in the link text file and gets the details for 
each of the movie. THis is followed by the cleaning process
for i in $(cat imdb_links.txt) 
do 
   curl $i | 
   html2text | 
   sed -n '/Sign_In/,$p'|  
   sed -n '/YOUR RATING/q;p' | 
   head -n-1 | 
   tail -n 2 
done > imdb_all.txt

The sample generated output is:

EN
⁰
    * Fully supported
    * English (United States)
    * Partially_supported
    * Français (Canada)
    * Français (France)
    * Deutsch (Deutschland)
    * हिंदी (भारत)
    * Italiano (Italia)
    * Português (Brasil)
    * Español (España)
    * Español (México)
****** Duck Soup ******
    * 19331933
    * Not_RatedNot Rated
    * 1h 9m
IMDb RATING
7.8/10

How do I change the code to further parse the output to get only the data from the title of the movie up until the imdb rating ( in this case, the line that contains the title 'Duck Soup' up until the end.

CodePudding user response:

Using sed

$ sed -n '/\*[^[:alpha:] ]*\*/,$ p' input_file
****** Duck Soup ******
    * 19331933
    * Not_RatedNot Rated
    * 1h 9m
IMDb RATING
7.8/10

CodePudding user response:

Here is the code:

#!/bin/bash

# fullname="USER INPUT"
read -p "Enter fullname: " fullname

if [ "$fullname" = "Charlie Chaplin" ]; then
  code="nm0000122"
else
  code="nm0000050"
fi

rm -f imdb_links.txt

curl "https://www.imdb.com/name/$code/#actor" |
  grep -Eo 'href="/title/[^"]*' |
  sed 's#^href="#https://www.imdb.com#g' |
  sort -u |
while read link; do
   # uncomment the next line to save links into file:
   #echo "$link" >>imdb_links.txt

   curl "$link" |
     html2text -utf8 |
     sed -n '/Sign_In/,/YOUR RATING/ p' |
     sed -n '$d; /^\*\{6\}.*\*\{6\}$/,$ p'
done >imdb_all.txt
  • Related