I want to find a regex that will allow me to match uppercase, lowercase and spaces in between.
That is, below you can see a sample of what I want to collect.
id,name,continent
1,Louise,Latin America
2,Sasha,Asia
3,Mike,North America
What I am doing is that inside a while I check if the records comply with the regex. But I have found that those that have a space in between are not picked up (such as North America or Latin America). You can see my code here
while read line; do
if [["$line"=~^.*,.*,[a-zA-Z ]*
I've also tried [a-zA-Z\n]*
but does not work.
Any idea?
CodePudding user response:
You can use
rx='^[0-9]*,[^,]*,[[:alpha:][:space:]]*$'
while read -r line; do
if [[ "$line" =~ $rx ]]; then
// Do something
fi
done < file
Details:
^
- string start[0-9]*
- zero or more digits (looks like yourID
column can only contain digits),
- a comma[^,]*
- any zero or more chars other than,
(.*
is too generic and matches any text, thus it will report valid if the line contains more than three columns),
- a comma[[:alpha:][:space:]]*
- zero or more letters or spaces$
- end of string.
See the online demo:
#!/bin/bash
s='id,name,continent
1,Louise,Latin America
2,Sasha,Asia
3,Mike,North America'
rx='^[0-9]*,[^,]*,[[:alpha:][:space:]]*$'
while read -r line; do
if [[ "$line" =~ $rx ]]; then
echo "$line: Valid"
else
echo "$line: Invalid"
fi
done <<< "$s"
Output:
id,name,continent: Invalid
1,Louise,Latin America: Valid
2,Sasha,Asia: Valid
3,Mike,North America: Valid