Home > Enterprise >  how to deal with strings on a numeric column in pandas?
how to deal with strings on a numeric column in pandas?

Time:08-11

I have a big dataset and I cannot convert the dtype from object to int because of the error "invalid literal for int() with base 10:" I did some research and it is because there are some strings within the column.

How can I find those strings and replace them with numeric values?

CodePudding user response:

You might be looking for .str.isnumeric(), which will only allow you to filter the data for these numbers-in-strings and act on them independently .. but you'll need to decide what those values should be

  • converted (maybe they're money and you want to truncate , or another date format that's not a UNIX epoch, or any number of possibilities..)
  • dropped (just throw them away)
  • something else
>>> df = pd.DataFrame({"a":["1", "2", "x"]})
>>> df
   a
0  1
1  2
2  x
>>> df[df["a"].str.isnumeric()]
   a
0  1
1  2
>>> df[~df["a"].str.isnumeric()]
   a
2  x

CodePudding user response:

Assuming 'col' the column name.

Just force convert to numeric, or NaN upon error:

df['col_num'] = pd.to_numeric(df['col'], errors='coerce')

If needed you can check which original values gave NaNs using:

df.loc[df['col'].notna()&df['col_num'].isna(), 'col']

CodePudding user response:

Base 10 means it is a float. so In python you would do

int(float(____))

Since you used int(), I'm guessing you needed an integer value.

  • Related