I tried this inspired by the accepted answer here:
df = pd.DataFrame({'col1':['1', '']})
all_numeric = pd.to_numeric(df['col1'], errors='coerce').notnull().all().item()
print(all_numeric)
to detect that a columns is numeric (ignoring blanks NULLs NANs).
In the above code all_numeric is False (Python bool), which does not make sense or maybe it does? I thought I try to impute nan as the reason might be the empty value:
df = pd.DataFrame({'col1':['1', '']})
df = df.apply(lambda x: x.str.strip()).replace('', np.nan)
all_numeric = pd.to_numeric(df['col1'], errors='coerce').notnull().all().item()
print(all_numeric)
Same outcome. Maybe my way of checking if all values of a column are numeric (ignoring NULL/NAN/empty strings) is wrong? Thanks!
CodePudding user response:
According to the official pandas doc, you can check a series of data if it's numeric or not.
>>df = pd.DataFrame({'col1':['1', '']})
>>df.col1.str.isnumeric()
0 True
1 False
CodePudding user response:
You could strip white spaces and convert empty string to NaN, then drop it; then do the test:
out = pd.to_numeric(df['col1'].str.strip().replace('', pd.NA).dropna(), errors='coerce').notna().all().item()
Output:
True
This test throws False for the following input:
df = pd.DataFrame({'col1':['1', 's']})