Regex to match uppercase, lowercase and spaces in between?-CodePudding

I want to find a regex that will allow me to match uppercase, lowercase and spaces in between.

That is, below you can see a sample of what I want to collect.

id,name,continent
1,Louise,Latin America
2,Sasha,Asia
3,Mike,North America

What I am doing is that inside a while I check if the records comply with the regex. But I have found that those that have a space in between are not picked up (such as North America or Latin America). You can see my code here

while read line; do
  if [["$line"=~^.*,.*,[a-zA-Z ]*

I've also tried [a-zA-Z\n]* but does not work.

Any idea?

CodePudding user response：

You can use

rx='^[0-9]*,[^,]*,[[:alpha:][:space:]]*$'
while read -r line; do
  if [[ "$line" =~ $rx ]]; then
  // Do something
  fi
done < file

Details:

^ - string start
[0-9]* - zero or more digits (looks like your ID column can only contain digits)
, - a comma
[^,]* - any zero or more chars other than , (.* is too generic and matches any text, thus it will report valid if the line contains more than three columns)
, - a comma
[[:alpha:][:space:]]* - zero or more letters or spaces
$ - end of string.

See the online demo:

#!/bin/bash
s='id,name,continent
1,Louise,Latin America
2,Sasha,Asia
3,Mike,North America'
rx='^[0-9]*,[^,]*,[[:alpha:][:space:]]*$'
while read -r line; do
  if [[ "$line" =~ $rx ]]; then
      echo "$line: Valid"
  else
      echo "$line: Invalid"
  fi
done <<< "$s"

Output:

id,name,continent: Invalid
1,Louise,Latin America: Valid
2,Sasha,Asia: Valid
3,Mike,North America: Valid