I want to keep only numbers from an numpy array of strings, which are not necessarily valid. My code looks looks like the following:
age = train['age'].to_numpy() # 200k values
set(age)
# {'1', '2', '3', '7-11', np.nan...}
age = np.array(['1', '2', '3', '7-11', np.nan])
Desired output:
np.array([1, 2, 3])
. Ideally, '7-11' would be 7, however, that's not simple and is a tolerable loss.
np.isfinite(x)
gives "ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''"
x = [num for num in age if isinstance(num, (int, float))]
returns []
CodePudding user response:
You could do something like the following
for pos, val in enumerate(age):
try:
new_val = int(val)
except:
new_val = np.nan
age[pos] = new_val
age = age[age!="nan"].astype(int)
print(age)
> array([1, 2, 3])
CodePudding user response:
Here's an option that will split strings on '-' first, and only take the first value, so '7-11' is converted to 7:
age = np.array(['1', '2', '3', '7-11', np.nan])
age_int = np.array([int(x[0]) for x in np.char.split(age, sep='-') if x[0].isdecimal()])
Output: array([1, 2, 3, 7])
There is a more efficient way to do this if you don't care about cases like '7-11':
age_int2 = age[np.char.isdecimal(age)].astype(int)
Output2: array([1, 2, 3])