Using this regular expression, I'm finding a string of numbers starting with 9, followed by 4 or 5 or 6 or 7 or 9, followed by 6 more numbers.
9*[45679]( *[0-9]){6}
I have a file named content.txt containing 3 columns. The first column is a date, the second a time and the third contains random text and numbers with spaces in it.
20/10/2022 19:00 test 1 99 435 18 1 more text
20/10/2022 20:00 test 2 97 123 1 81 more text2
20/10/2022 21:00 test 3 96 4 3 5567 more text3
20/10/2022 22:00 test 4 99 43 5181 more text4
Using my regular expression I want to modify the third column and leave only the results of the regular expression, with no spaces, so the result should be
20/10/2022 19:00 99435181
20/10/2022 20:00 97123181
20/10/2022 21:00 96435567
20/10/2022 22:00 99435181
CodePudding user response:
With GNU sed. I assume your field separator is one space.
sed -E 's/^(.{16}).*( 9[45679]( *[0-9]){6}).*/\1\2/; s/ //g3' file
Output:
20/10/2022 19:00 99435181 20/10/2022 20:00 97123181 20/10/2022 21:00 96435567 20/10/2022 22:00 99435181
See: mans sed
CodePudding user response:
The filter can be the 5nd column is 94/95/96/97/99, then remove the space after 9* columns, return the first 8 character.
rq
(https://github.com/fuyuncat/rquery/releases) provides inline functions replace
& 'substr' to do these jobs.
[ rquery]$ ./rq -q "s @1,@2,substr(replace(substr(@raw,strlen(@1 ' ' @2 ' ' @3 ' ' @4) 1),' ',''),0,8) | f @5 in (94,95,96,97,99)" samples/search9.txt
20/10/2022 19:00 99435181
20/10/2022 20:00 97123181
20/10/2022 21:00 96435567
20/10/2022 22:00 99435181
If you prefer regex matching, can try regmatch
to get the string, reglike
to filter the rows.
CodePudding user response:
If you have GNU awk, one option is to use the gensub()
function, e.g.
gawk '{
a = gensub(/.*(9[45679] [0-9 ]{6,}).*/, "\\1", "g", $0) #extract the numbers and spaces
gsub(/ /, "", a) #remove the spaces
print $1, $2, a
}' test.txt
20/10/2022 19:00 99435181
20/10/2022 20:00 97123181
20/10/2022 21:00 96435567
20/10/2022 22:00 99435181
And I believe this will work with non-GNU awk, although I would need to test it to be sure:
awk '
match($0, / 9[45679] [0-9 ]{6,}/) { #match the regex
a = substr($0, RSTART 1, RLENGTH-1) #extract the numbers and spaces
gsub(/ /, "", a) # remove the spaces
print $1, $2, a
}' test.txt
20/10/2022 19:00 99435181
20/10/2022 20:00 97123181
20/10/2022 21:00 96435567
20/10/2022 22:00 99435181
CodePudding user response:
In GNU awk
with your shown samples please try following awk
code. Here is the working Online Demo for used regex.
awk '
match($0,/^([0-9]{2}\/[0-9]{2}\/[0-9]{4})\s ([0-9]{2}:[0-9]{2})(\s \S ){2}\s ([0-9] \s [0-9] \s [0-9] \s [0-9]*).*/,arr){
gsub(/ /,"",arr[4])
print arr[1],arr[2],arr[4]
}
' Input_file
Explanation: Adding detailed explanation for used regex.
^( ##Matching from starting of the value and opening 1st capturing group.
[0-9]{2}\/[0-9]{2}\/[0-9]{4} ##Matching 2 digits followed by / followed by 2 digits / and followed by 4 digits.
) ##Closing 1st capturing group here.
\s ##Matching 1 or more spaces here.
( ##Opening 2nd capturing group here.
[0-9]{2}:[0-9]{2} ##Matching 2 digits followed by colon followed by 2 digits.
) ##Closing 2nd capturing group here.
(\s \S ){2} ##In 3rd capturing group matching spaces followed by non-spaces matching 2 occurrences of this group.
\s ##Matching 1 or more spaces.
( ##Opening 4th capturing group here.
[0-9] \s [0-9] \s [0-9] \s [0-9]* ##Matching digits followed by spaces followed by digits folllowed by spaces followed by digits followed by digits followed by spaces followed by Optional digits.
) ##Closing 4th capturing group here.
.* ##Matching everything till end of value here.