Home > database >  How can I extact the few entries based on certain conditions?
How can I extact the few entries based on certain conditions?

Time:07-30

I have a data in the following format.

>ab:xy_a0by98-2 \Movie= top gun \actor= Tom \Genere=Action \Length=234 \Credits=30 \pe=1 \summry=(Tom|action|234)
Top Gun is a 1986 American action drama film directed by Tony Scott, and produced by Don Simpson and Jerry Bruckheimer

>ab:xy_b0ha81-5 \Movie= Thor \actor= chris hemsworth \Genere=Action \Length=321 \Credits=20 \pe=0 \summry=(chris|Action|321)
Thor embarks on a journey unlike anything he's ever faced a quest for inner peace

>ab:xy_c0ma65-1 \Movie= Batman \actor= Bale \Genere=Action \Length=251 \Credits=30 \pe=1 \summry=(Bale|Action|251)
From American Psycho to Batman Begins to Vice, Christian Bale is a bonafide A-list star
But he missed out on plenty of huge roles along the way.

>ab:xy_d0fc78-2 \Movie= Joker \actor= Phoenix \Genere=thriller \Length=341 \Credits=35 \pe=2 \summry=(phoenix|thriller|341)
Joker is a 2019 American psychological thriller film directed and produced by Todd Phillips
who co-wrote the screenplay with Scott Silver

>ab:xy_e0ra81-2 \Movie= Superman \actor= henry cavill \Genere=Action \Length=254 \Credits=28 \pe=1 \summry=(cavill|action|254)
Henry William Dalgliesh Cavill is a British actor
He is known for his portrayal of Charles Brandon in Showtime's The Tudors

I want to extract all the entries which contain pe=1, each entiry starts with the > symobol as follows:

>ab:xy_a0by98-2 \Movie= top gun \actor= Tom \Genere=Action \Length=234 \Credits=30 \pe=1 \summry=(Tom|action|234)
Top Gun is a 1986 American action drama film directed by Tony Scott, and produced by Don Simpson and Jerry Bruckheimer

>ab:xy_c0ma65-1 \Movie= Batman \actor= Bale \Genere=Action \Length=251 \Credits=30 \pe=1 \summry=(Bale|Action|251)
From American Psycho to Batman Begins to Vice, Christian Bale is a bonafide A-list star
But he missed out on plenty of huge roles along the way.

>ab:xy_e0ra81-2 \Movie= Superman \actor= henry cavill \Genere=Action \Length=254 \Credits=28 \pe=1 \summry=(cavill|action|254)
Henry William Dalgliesh Cavill is a British actor
He is known for his portrayal of Charles Brandon in Showtime's The Tudors

and to format few values in a table as:

Name            Length
ab:xy_a0by98-2  234
ab:xy_c0ma65-1  251
ab:xy_e0ra81-2  254

I tried grep "pe=1" input.txt > output.txt. But it has extarcted only the first line not the description. Any help appreciated...

CodePudding user response:

This sed command should do the job:

sed -n 's/^>\([^[:blank:]]*\).*\\Length=\([0-9]*\).*\\pe=1.*/\1 \2/p' file

CodePudding user response:

1st solution(With GNU awk): With your shown samples please try following in awk code. Written and tested in GNU awk.

awk '/^>.*\\pe=1 / && match($0,/\\Length=([0-9] )/,arr){print $1,arr[1]}' Input_file


2nd solution(with any awk): With any version of awk please try following code, little tweak of 1st solution.

awk '
/^>.*\\pe=1 / && match($0,/\\Length=[0-9] /){
  val=substr($0,RSTART,RLENGTH)
  sub(/.*=/,"",val)
  print $1,val
}
' Input_file
  • Related