check if pandas data frame column (string/object) is numeric (ignore empty/NULL/NAN)-CodePudding

I tried this inspired by the accepted answer here:

df = pd.DataFrame({'col1':['1', '']})
all_numeric = pd.to_numeric(df['col1'], errors='coerce').notnull().all().item()
print(all_numeric)

to detect that a columns is numeric (ignoring blanks NULLs NANs).

In the above code all_numeric is False (Python bool), which does not make sense or maybe it does? I thought I try to impute nan as the reason might be the empty value:

df = pd.DataFrame({'col1':['1', '']})
df = df.apply(lambda x: x.str.strip()).replace('', np.nan)
all_numeric = pd.to_numeric(df['col1'], errors='coerce').notnull().all().item()
print(all_numeric)

Same outcome. Maybe my way of checking if all values of a column are numeric (ignoring NULL/NAN/empty strings) is wrong? Thanks!

CodePudding user response：

According to the official pandas doc, you can check a series of data if it's numeric or not.

>>df = pd.DataFrame({'col1':['1', '']})
>>df.col1.str.isnumeric()

0     True
1    False

CodePudding user response：

You could strip white spaces and convert empty string to NaN, then drop it; then do the test:

out = pd.to_numeric(df['col1'].str.strip().replace('', pd.NA).dropna(), errors='coerce').notna().all().item()

Output:

True

This test throws False for the following input:

df = pd.DataFrame({'col1':['1', 's']})