Home > Software design >  Extract the last 1 or 2 alphabetic character(s) from a quantity (mg,g,ml,l,cm,mm,m) when preceded by
Extract the last 1 or 2 alphabetic character(s) from a quantity (mg,g,ml,l,cm,mm,m) when preceded by

Time:08-07

I need to be able to populate a cell in a Google Sheets spreadsheet with the measurement units extracted from the end of a string value in another cell. The raw data comes through with every source cell ending with a measurement unit, either preceded with a numeric value or not, as in the example data below...

SAMPLE DATA:

Colgate Plax Spearmint Alcohol Free Mouthwash 500ml
Peckish Tangy BBQ Rice Crackers 100g
Alison's Pantry BBQ Chickpea Snacks kg
Yoghurt Raisins Miscellaneous Confectionery kg
Roasted Unsalted Supreme Mixed Nuts kg
Alison's Pantry Honey & Dijon Snippets kg
Banana Chips kg
Sealord Satay Tuna 95g
Sealord Savoury Onion Tuna 95g
Coca-Cola No Sugar Soft Drink 2.25l
Tongariro Natural Spring 15l
Trident Sweet Chilli Sauce With Ginger 285ml
Pams Lite Whole Egg Mayonnaise 443ml
Value Lite Milk 2l
Morning Harvest Caged Size 7 Eggs 12pk

EXPECTED RESULT:

![New column showing the measurement units][1]

CURRENT METHODOLOGY:

=IF(A1<>"",REGEXEXTRACT(A1,"^.*([a-zA-Z][a-zA-Z])$|^.*([a-zA-Z])$"),"")

CURRENT RESULT:

![Result being split over two columns][2]

While I can combine the two values into a third column using the expression =IF(B1<>"",B1,IF(C1<>"",C1,"")), this becomes messy, convoluted, and adds unnecessary columns. I would prefer to tweak the regular expression to return just a single value, either the one or two character measurement unit. I have no idea how to achieve this, though. Any help would be appreciated.

CodePudding user response:

You could also make the pattern a bit more specific matching either a digit of space, and capture one of the units at the end of the string.

=IF(A1<>"",REGEXEXTRACT(A1, "[\d ]((?:m?l|[mk]?g|pk|[cm]?m))$"),"")

See a enter image description here

CodePudding user response:

Match 1 optional letter, then 1 letter anchored to end:

IF(A1<>"",REGEXEXTRACT(A1, "[a-zA-Z]?[a-zA-Z]$"),"")
  • Related