I have a data in the following format.
>ab:xy_a0by98-2 \Movie= top gun \actor= Tom \Genere=Action \Length=234 \Credits=30 \pe=1 \summry=(Tom|action|234)
Top Gun is a 1986 American action drama film directed by Tony Scott, and produced by Don Simpson and Jerry Bruckheimer
>ab:xy_b0ha81-5 \Movie= Thor \actor= chris hemsworth \Genere=Action \Length=321 \Credits=20 \pe=0 \summry=(chris|Action|321)
Thor embarks on a journey unlike anything he's ever faced a quest for inner peace
>ab:xy_c0ma65-1 \Movie= Batman \actor= Bale \Genere=Action \Length=251 \Credits=30 \pe=1 \summry=(Bale|Action|251)
From American Psycho to Batman Begins to Vice, Christian Bale is a bonafide A-list star
But he missed out on plenty of huge roles along the way.
>ab:xy_d0fc78-2 \Movie= Joker \actor= Phoenix \Genere=thriller \Length=341 \Credits=35 \pe=2 \summry=(phoenix|thriller|341)
Joker is a 2019 American psychological thriller film directed and produced by Todd Phillips
who co-wrote the screenplay with Scott Silver
>ab:xy_e0ra81-2 \Movie= Superman \actor= henry cavill \Genere=Action \Length=254 \Credits=28 \pe=1 \summry=(cavill|action|254)
Henry William Dalgliesh Cavill is a British actor
He is known for his portrayal of Charles Brandon in Showtime's The Tudors
I want to extract all the entries which contain pe=1, each entiry starts with the >
symobol as follows:
>ab:xy_a0by98-2 \Movie= top gun \actor= Tom \Genere=Action \Length=234 \Credits=30 \pe=1 \summry=(Tom|action|234)
Top Gun is a 1986 American action drama film directed by Tony Scott, and produced by Don Simpson and Jerry Bruckheimer
>ab:xy_c0ma65-1 \Movie= Batman \actor= Bale \Genere=Action \Length=251 \Credits=30 \pe=1 \summry=(Bale|Action|251)
From American Psycho to Batman Begins to Vice, Christian Bale is a bonafide A-list star
But he missed out on plenty of huge roles along the way.
>ab:xy_e0ra81-2 \Movie= Superman \actor= henry cavill \Genere=Action \Length=254 \Credits=28 \pe=1 \summry=(cavill|action|254)
Henry William Dalgliesh Cavill is a British actor
He is known for his portrayal of Charles Brandon in Showtime's The Tudors
and to format few values in a table as:
Name Length
ab:xy_a0by98-2 234
ab:xy_c0ma65-1 251
ab:xy_e0ra81-2 254
I tried grep "pe=1" input.txt > output.txt
. But it has extarcted only the first line not the description.
Any help appreciated...
CodePudding user response:
This sed
command should do the job:
sed -n 's/^>\([^[:blank:]]*\).*\\Length=\([0-9]*\).*\\pe=1.*/\1 \2/p' file
CodePudding user response:
1st solution(With GNU awk
): With your shown samples please try following in awk
code. Written and tested in GNU awk
.
awk '/^>.*\\pe=1 / && match($0,/\\Length=([0-9] )/,arr){print $1,arr[1]}' Input_file
2nd solution(with any awk
): With any version of awk
please try following code, little tweak of 1st solution.
awk '
/^>.*\\pe=1 / && match($0,/\\Length=[0-9] /){
val=substr($0,RSTART,RLENGTH)
sub(/.*=/,"",val)
print $1,val
}
' Input_file