Home > OS >  How to match the word between digits and colon with sed
How to match the word between digits and colon with sed

Time:08-15

I am trying to figure out how to match everything between the time and the colon (the colon should also be removed) with sed. The time can be different (24H format) and should not be limited to 21:30 which is the example here.

I am trying to delete the UNWANTED TEXT:

#EXTINF:-1,21:30 UNWANTED TEXT: Apples - Oranges

The output wanted:

#EXTINF:-1,21:30 Apples - Oranges

Can anyone help me with this?

I tried following, this works however it is based on whitespace always being there and I want to use the time as the pattern.

sed 's# .*:##g'

Thanks!

CodePudding user response:

No claims to elegance, but how about: sed 's/\(.*[0-9]\{2\}:[0-9]\{2\}\)[^:]*:\(.*\)/\1\2/'

This requires the time to be two digits, colon, two digits, and will likely fail if that pattern occurs more than once in the string.

Breaking down the regular expression \(.*[0-9]\{2\}:[0-9]\{2\}\)[^:]*:\(.*\):

\( - start a grouping
.*  - match any character until we find the next pattern, [0-9]\{2\}:[0-9]\{2\}
[0-9] - match any digit
[0-9]\{2\} - match any two digits
[0-9]\{2\}:[0-9]\{2\} - match ab:cd, where a,b,c,d are single decimal digits
\) - end the first grouping. We'll later refer to all the characters matched by this grouping as "\1"
[^:]* - match any character that's not a colon
[^:]*: - match any number of non-colon characters followed by a single colon. This is what we'll remove from the string.
\( - start the second grouping
\(.*\) - match any characters in the second grouping, which we'll refer to later as "\2"

Once we've done that match, we've more or less split our string into three pieces:

  • The first grouping, called "\1", which is everything from the start of the string to the time
  • The part of the string we're going to remove, which immediately follows the time and consists of a number of non-colon characters followed by a ":"
  • The second grouping, which is everything after the colon.

Basically, we replace these three pieces with the first grouping and the second grouping. That's accomplished by the /\1\2/ part of the sed command.

Sed regex reference: https://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html

CodePudding user response:

Using sed

$ sed -E 's/([0-9] :[0-9] )[^:]*:/\1/' input_file
#EXTINF:-1,21:30 Apples - Oranges

CodePudding user response:

If the line can potentially contain something that looks like the timestamp in more than one place, you may want to start the match at the beginning of the line:

sed -E 's/^(#[^:] :-?[0-9] ,[0-9] :[0-9] )[^:] :/\1/'
  • -E - Extended regular expressions
  • ^ - Start of the line anchor
  • ( - Start of capture group 1
  • #[^:] - A literal # followed by one or more non-colon characters
  • :-? - A literal : followed by zero or one -
  • [0-9] , - One or more digits followed by a ,
  • [0-9] : - One or more digits followed by a :
  • ) - End of capture group 1
  • [^:] : - One or more non-colons followed by a colon. Not captured. This is where UNWANTED TEXT should be

The above match is then replaced by

  • \1 - Capture 1 (which doesn't include UNWANTED TEXT)
  • Related