Home > Software engineering >  Can I use a regular expression to help format this data to separate name, age, and address?
Can I use a regular expression to help format this data to separate name, age, and address?

Time:10-20

I am working on an assignment for class, and we need to format this data. I was thinking that regular expressions would be a very elegant way of formatting the data. But, I ran into some trouble. This is my first time doing this before and I do not know how to properly split the data. I want the beginning to the first digit be the first section, the first digit until the next white space to be the second section, and there till the end of the line to be the third section. Here is my data:

Amber-Rose Bowen 53    123 Machinery Rd.
Joyce Kirkland 19 234 Cylinder Dr.
Seb Dotson 32 3456 Surgery Ln. 
Dominique Hough 58 654 Election Rd.
Yasemin Mcleod 29 555 Cabinet Ave.
Nancy Lord 80       232 Highway Rd.
Tracy Mckenzie 72 101 Device Ave.
Alistair Salter 25 109 Guitar Ln.
Adeel Sears 42 222 Solitare Rd.

I have been using https://regex101.com/ to test my ideas. ([a-zA-Z] )([0-9] ) this is my start, but I do not know how to go from the start to the first digit. (or any other part of this)

CodePudding user response:

You can use

^(.*?)[^\S\r\n] (\d )[^\S\r\n] (\S.*)

See the regex demo. This regex can also be used with a multiline flag to extract data from a multiline string.

Details:

  • ^ - start of string
  • (.*?) - Group 1: any zero or more chars other than line break chars as few as possible
  • [^\S\r\n] - zero or more horizontal whitespaces (in some regex flavors, you can use \h or [^\p{Zs}\t] instead)
  • (\d ) - Group 2: one or more digits
  • [^\S\r\n] - one or more horizontal whitespaces
  • (\S.*) - Group 3: a non-whitespace char and then the rest of the line.

CodePudding user response:

If you merely wish separate the string into full name, age and street address you may split the string on matches of the expression

(?i)(?<=[a-z]|\d)  (?=\d)

For example:

Amber-Rose Bowen 53    123 Machinery Rd.
                ^  ^^^^

Demo

This expression reads: "match one or more spaces preceded by a letter or digit and followed by a digit". (?i) causes the match of a letter to be case-indifferent. (?<=[a-z]|\d) is a positive lookbehind; (?=\d) is a positive lookahead.


You may use the following regular expression if you wish to to extract first name, last name, age, street number and street name.

^(?<first_name>\S )  (?<last_name>\S )  (?<age>\d )  (?<street_nbr>\d )  (?<stret_name>.*)

For example:

Amber-Rose Bowen 53    123 Machinery Rd.
^^^^^^^^^^ ^^^^^ ^^    ^^^ ^^^^^^^^^^^^^
     1       2   3      4       5

1: first_name
2: last_name
3: age
4: street_nbr
5: street_name

Demo

I've used the PCRE regex engine with named capture groups. The expression would be similar for other regex engines, though some do not support named groups, in which case you would have to use unnamed groups (group 1, group 2, and so forth.)

Note that this only works because of the consistent structure of your data. In real life some strings may contain such things as middle names or apartment numbers, which would complicate the parsing of the strings.

  • Related