I have a big data set what is messed up. I tried to clean it. The data looks like this:
data= np.array(['0,51\n0,64\n0,76\n0,84\n1,00', 1.36]) #...
My goal is to extract the raw numbers:
numbers= [51, 64, 76, 84, 100, 136]
What I tried worked, but I think it is not that elegant. Is there a better way to do it?
import numpy as np
import re
clean= np.array([])
for i in data:
i = str(i)
if ',' in i:
without= i.replace(',', '')
clean= np.append(clean, without)
elif '.' in i:
without= i.replace('.', '')
clean= np.append(clean, without)
#detect all numbers
numbers= np.array([])
for i in clean:
if type(i) == np.str_:
a= re.findall(r'\b\d \b', i)
numbers= np.append(numbers, a)
CodePudding user response:
Generally, you should never use np.append
in a loop since it recreate a new array every time resulting in an inefficient quadratic complexity.
Besides this, you can use the following one-liner to solve your problem:
result = [int(float(n.replace(',', '.'))*100) for e in data for n in e.split()]
The idea is to replace ,
by a .
and then parse the string as a float so to produce the right integer based on this. You can convert it to a numpy array with np.fromiter(result, dtype=int)
.