I am trying to separate information copied from a PDF table - id usually use text to columns but the only delamination is spaces and this then splits the data into multiple unusable columns
The data comes like this:
Raw Data |
---|
A1 Company 0 |
Company2 40000 |
name a 1 |
name b 15 |
name c 184 |
Big 17 Company 1887 |
I need the output to be:
Company | Units |
---|---|
A1 Company | 0 |
Company2 | 40000 |
name a | 1 |
name b | 15 |
name c | 184 |
Big 17 Company | 1887 |
So the company name (that might contain numbers) is separated for the unit number (that could be 1-5 digits long).
I haven't been able to figure out a way that uses =len() as the string length isn't a constant mixed with the last numbers not being a consistent number of digits.
I'm currently using:
=SUMPRODUCT(MID(0&A2, LARGE(INDEX(ISNUMBER(--MID(A2, ROW(INDIRECT("1:"&LEN(A2))), 1)) * ROW(INDIRECT("1:"&LEN(A2))), 0), ROW(INDIRECT("1:"&LEN(A2)))) 1, 1) * 10^ROW(INDIRECT("1:"&LEN(A2)))/10)
This gives me all the numbers in the cell - which works for 90% of the data as most of the company's don't have numbers in their name. But for something like 'A1 Company 0' it gives 10 as the output not just the 0. I then go and manually edit the small number of companies that this happens too.
I then use a mixture of =LEN()
=LEFT
and =RIGHT
to split the information up as required for the further automated analysis.
I'd prefer a formula over VBA/macro
I cant provide the actual data but I hope I've given enough examples in the table above to show the main problems (different company name lengths, companies with numbers in their name, different amount of digits representing the units)
CodePudding user response:
Using Libre Office, but this formula checks for the last space in the cell
=RIGHT(A1,LEN(A1)-FIND("@",SUBSTITUTE(A1," ","@",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))),1))
CodePudding user response:
See if the following works for you:
Formula in B2
:
=LEFT(A2,LEN(A2)-1-LEN(C2))
In C2
:
=-LOOKUP(1,-RIGHT(A2,ROW($1:$5)))
For those users using ms365's newest functions:
=HSTACK(TEXTBEFORE(A2," ",-1),TEXTAFTER(A2," ",-1))