Is there a way that I can use the cut command with space as delimiter and treat a word with space li-CodePudding

I have created this file concacaf.txt with the following input

David Canada 5
Larin Canada 5
Borges Costa Rica 2
Buchanan Canada 2
Davis Panama 2
Gray Jamaica 2
Henriquez El Salvador 2

Is there a way that I can either use the cut command and treat Costa Rica or El Salvador as a single word or modify the text so that when I use: cut -f 1,3 -d ' ' concacaf.txt I get 'Borges 2' instead of 'Borges Rica'. Thanks

CodePudding user response：

It is not possible using cut but it is possible using sed:

sed -E 's/^([^ ]*) .* ([^ ]*)$/\1 \2/' concacaf.txt

It searches for the first word ([^ ]*, a sequence of non-space characters) at the beginning of the line and the word at the end of the line and replaces the entire line with the first word and the last word and a space between them.

The option -E tells sed to use modern regular expressions (by default it uses basic regular expressions and the parentheses need to be escaped).

The sed command is s (search). It searches in each line using a regular expression and replaces the matching substring with the provided replacement string. In the replacement string, \1 represents the substring matching the first capturing group, \2 the second group and so on.

The regular expression is explained below:

^             # matches the beginning of line
(             # starts a group (it is not a matcher)
  [^ ]        # matches any character that is not a space (there is a space after `^`)
  *           # the previous sub-expression, zero or more times
)             # close the group; the matched substring is captured
              # there is a space here in the expression; it matches a space
.*            # match any character, any number of times
              # match a space
([^ ]*)       # another group that matches a sequence of non-space characters
$             # match the end of the line

CodePudding user response：

You can use rev to cut out that last field containing the integer:

$ cat concacaf.txt | rev | cut -d' ' -f2- | rev
David Canada
Larin Canada
Borges Costa Rica
Buchanan Canada
Davis Panama
Gray Jamaica
Henriquez El Salvador