Home > database >  How to parse and map multiple fields from 1 excel cell
How to parse and map multiple fields from 1 excel cell

Time:04-01

Example here

I've got a CSV parser and the file has multiple strings which I need to map to different fields within one excel cell.

So the example image above I have the postcode, state, and suburb which are all in the same cell - but I need to map these to their respective tables in the database.

How would I do this? The team is saying likely regex, if so what would be the regex I use?

Cheers guys!

I expect each type of data to be split into the relevant data table

CodePudding user response:

Regex they say? Then you will have two problems.

Tackle this with extracting the postcode, then a string split. Here in psuedo code:

For i = 1 to lines.Length
string s = lines[i];
string postCode = s[s.Length -4]; // Get postcode
s = s[0, s.Length -4].TrimEnd(); //Remove postcode and trim last space
arr[] = s.Split(" ");  // Split words into array
string state= arr[arr.Length - 1];   //The last word is the state
string suburb; 
  For j = 0 to arr.Length - 1
     suburb  = arr[j];
  Next j
Next i

CodePudding user response:

Not sure what language/framework you're using, but I'm just posting this in vanilla javascript.

The regex I'd use for each cell is:

/([\w\s] )\s([A-Z]{2,3})\s(\d{4})/

Which breaks down into:

  1. ([\w\s] ) capture multiple words or spaces
  2. \s followed by a space
  3. ([A-Z]{2,3}) capture a group of two or three capital letters
  4. \s followed by a space
  5. (\d{4}) capture four digits

I always find https://www.regexpal.com really useful to write and test my regexes.

Using that in javascript:

const addresses = [
  'Elizabeth Bay NSW 2011',
  'Sydney NSW 2000',
  'St Kilda VIC 3182',
  'Sausage 1234567890',
  'Middle Park QLD 4047',
  'The Big Pineapple QLD 4559',
  'Uluru NT 0872'
]

const reg = /([\w\s] )\s([A-Z]{2,3})\s(\d{4})/

const matches = addresses.map(address => {
  const match = address.match(reg)
  return match && match.slice(1)
})

console.log(matches)

Will give you:

// Matching string, Suburb, State, Postcode
["Elizabeth Bay NSW 2011", "Elizabeth Bay", "NSW", "2011"]
["Sydney NSW 2000", "Sydney", "NSW", "2000"]
["St Kilda VIC 3182", "St Kilda", "VIC", "3182"]
null // The non-matching entry
["Middle Park QLD 4047", "Middle Park", "QLD", "4047"]
["The Big Pineapple QLD 4559", "The Big Pineapple", "QLD", "4559"]
["Uluru NT 0872", "Uluru", "NT", "0872"]
  • Related