Hello and thank you in advance,
I buy items that have a variety of human written listings on auction sites and forums. Often times, the quantity is clear to a person, but extracting it has been a real challenge. I'm using google sheets and REGEXEXTRACT().
I consider myself to be a intermediate regex user, but this has me stumped, so I need an expert.
Here's a few examples, my desired return, and what I'm getting.
Listing | Desired Return | Actual Return |
---|---|---|
Red 1996 Corvette 2x - Matchbox | 2 | 2 |
3 x SmartCar, broken 2nd door | 3 | 3 |
2nd edition Kindle (x3) | 3 | 3 |
**1x** 2008 financial crash notice | 1 | 1 |
Collectors Edition Beannie Baby, item 204/343 | 1 | 4 |
(6) Nissan window motors (1995-1998 ONLY) | 6 | N/A |
White chevy F150, 1996 | 1 | 6 |
Green bowl, cracked (stored in room 2A5) | 1 | 5 |
As I thought through this, I think I can put some reasonable limitations on this logic, but the code is harder.
- The quantities will only be a single number 1-9. (perhaps reject all numbers > 9?)
- They'll possibly be precede by or followed by an X or x, with or without a space
- The quantity may be next to a special character like * , () or -
- It should ignore all 1st, 2nd, 3rd, - 9th style notation
- If a number is mixed in a word, like 2A3, it should ignore all
Obviously most description don't have any quantity, so if there's no return or zero, that's fine.
I have something that feels close, and does a reasonable job:
[^a-wy-zA-WY-Z0-9]*([1-4]){1}([^a-wA-w0-9]|$)
It doesn't return anything with the returns marked of 1*, and that's fine. It breaks on the last two, and I've struggled for too long!
Thanks in advance!
CodePudding user response:
You can use
=IFNA(INT(REGEXEXTRACT(REGEXREPLACE(LOWER(A27), "\d{2,}|(x\d)|(\dx)|[^\W\d] \d\w*|\d [^\W\d]\w*", "$1$2"), "(\d)")), 1)
Here,
REGEXREPLACE(LOWER(A27), "\d{2,}|(x\d)|(\dx)|[^\W\d] \d\w*|\d [^\W\d]\w*", "$1$2")
finds and removes chunks of two or more digits, or chunks with a digit and at least one letter, but keeps the sequences where a digit is preceded or followed withx
REGEXEXTRACT(..., "(\d)"))
extracts the first digit left after the replacement=IFNA(INT(...), 1)
either casts the found digit to integer, or, if there was no match, inserts1
into the column.