I got a list containing movie names like this
Film Name - Film.information.lanugage.2160p.more.info
Film Name - Film.info.information.1080p.more.info
Film Name - Film.information.lanugage.1080p.information.info
Film Name - Film.information.more.720p.more.info
Film Name - Film.more.lanugage.2160p.more.info
I am using grep '[0-9][0-9][0-9][0-9]p' list.txt > resolution.txt
to filter the resolution. And I will search for a sed cmd to delete everything with and past the -
Should look smth like this I think
sed 's/-.*$//g' list.txt > cleanList.txt
after that I want to add the resolution from the resolution.txt to the end of the lines from the cleanList.txt
the final file should look like this
Film Name 2160p
Film Name 1080p
Film Name 1080p
Film Name 720p
Film Name 2160p
CodePudding user response:
You can use
sed -E 's/(.*) - (.*[^0-9])?((480|720|1080|1440|2160|4320)p?)([^0-9].*)?/\1 \3/' list.txt > output.txt
Details:
(.*)
- matches and captures into Group 1 as many any chars as possible-
- space-
space(.*[^0-9])?
- Group 2 (optional): any text and then a non-digit char((480|720|1080|1440|2160|4320)p?)
- Group 3: any of the common resolution values (in Group 4) and then an optionalp
([^0-9].*)?
- Group 5 (optional): a non-digit char and then any text.
The \1 \2
replacement replaces the matched line with Group 1 space Group 2 values.
See the online demo:
#!/bin/bash
s='Film Name - Film.information.lanugage.2160p.more.info
Film Name - name name - Film.info.information.1080p.more.info
Star Wars - Episode V - Das Imperium schlägt zurück - Star.Wars.Episode.V.Das.Imperium.schlaegt.zurueck.1980.German.DL.2160p.UHD.BluRay.x265-ENDSTATiON
Film Name - Film.information.lanugage.1080p.information.info
Film Name - asfasfaf - Film.information.more.720p.more.info
Film Name - Film.more.lanugage.2160p.more.info
Boss Baby - Schluss mit Kindergarten - pso-bossbaby2_bd.1080p
Sicario 2 - encounters-si2so_1080p
Skyscraper - encounters-skyscraper_1080p
Unsere Zeit ist jetzt - roor-unserezeit-1080p
Schindlers Liste - d-schindlersliste-1080p
South Park: Der Film – größer, länger, ungeschnitten - in-southpark1080p
Ein Hund namens Palma - rf-ehnp2021.1080
Taxi Driver (1976) - d-taxidriver-1080p
The Taking of Deborah Logan - The.Taking.of.Deborah.Logan.2014.LIMITED.1080p.BluRay.X264-CADAVER
Die Feuerzangenbowle 1944 - d-feuerzangenbowle-1080p
Hooligans - rsg-hooligans-1080p
Geständnisse - Confessions - wombat-gestaendnisse-1080p
Greyhound - greyhound.2020.german.dl.1080p.web.h264-wayne'
sed -E 's/(.*) - (.*[^0-9])?((480|720|1080|1440|2160|4320)p?)([^0-9].*)?/\1 \3/' <<< "$s"
Output:
Film Name 2160p
Film Name - name name 1080p
Star Wars - Episode V - Das Imperium schlägt zurück 2160p
Film Name 1080p
Film Name - asfasfaf 720p
Film Name 2160p
Boss Baby - Schluss mit Kindergarten 1080p
Sicario 2 1080p
Skyscraper 1080p
Unsere Zeit ist jetzt 1080p
Schindlers Liste 1080p
South Park: Der Film – größer, länger, ungeschnitten 1080p
Ein Hund namens Palma 1080
Taxi Driver (1976) 1080p
The Taking of Deborah Logan 1080p
Die Feuerzangenbowle 1944 1080p
Hooligans 1080p
Geständnisse - Confessions 1080p
Greyhound 1080p
CodePudding user response:
You can use the pipe '|' operand to pass the output of one command as the input of a second command. For example:
grep '[0-9][0-9][0-9][0-9]p' list.txt | sed 's/-.*$//g' list.txt > cleanList.txt
If you want to save the output of the first to a file AND process it with the second, you should use the command tee (tree) to write the same output to both. Example: grep '...' list.txt | tee resolution.txt | sed '...' > cleanList.txt
See: https://www.geeksforgeeks.org/tee-command-linux-example/ How to redirect output to a file and stdout How does a pipe work in Linux?
CodePudding user response:
I suggest you to use awk
which gives you a cleaner solution, in one pass, rather than using grep
and sed
.
Try:
awk -F" - " '{match($2, "[0-9] p"); print $1, substr ($2, RSTART, RLENGTH)}' list.txt > cleanList.txt
I use the string " - "
as field separator between $1
and $2
on each input line.
The function match()
looks for some regex corresponding to digits followed by the letter p
inside of $2
. This function sets the variables RSTART
and RLENGTH
in a way that is suitable for the function substr()
to extract the matching pattern and to print it.