Home > OS >  Dataframe: How to remove dot in a string
Dataframe: How to remove dot in a string

Time:03-14

I want to use categorical features directly with CatBoost model and I need to declare my object columns as categorical in Catboost model . I have a column in my data frame which is an object containing nace codes looking like this:

NACE_code

5632      81.101
8060      41.200
15147     43.120
24644     68.100
29144     86.909
37122         68
39853         43
59268         43
108633    70.220
108693    56.102
175820    43.320
184606    41.200
Name: NACE_code, dtype: object

Python doesn't accept this column as categorical column. Instead it tells me that this is a float since some of the values have dots. I am relatively new in python and I have tried different ways to remove the dot from those values but my last attempt changes all those values without dot to NAN.

df['NACE_code'].str.replace(r"(\d)\.", r"\1")

5632      81101
8060      41200
15147     43120
24644     68100
29144     86909
37122       NaN
39853       NaN
59268       NaN
108633    70220
108693    56102
175820    43320
184606    41200
Name: NACE_KODE, dtype: object

How do I get my column to look like this? I appreciate any help I can get!

5632      81101
8060      41200
15147     43120
24644     68100
29144     86909
37122       68
39853       43
59268       43
108633    70220
108693    56102
175820    43320
184606    41200

CodePudding user response:

Use astype('str') to convert columns to string type before calling str.replace.

Without regex:

df['NACE_code'].astype('str').str.replace(r".", r"", regex=False)

CodePudding user response:

# The following code should work:
df.NACE_code = df.NACE_code.astype(str)
df.NACE_code = df.NACE_code.str.replace('.', '')

CodePudding user response:

Thanks for the response Aakash Dusane and gajendragarg!

When I run either of these, new digits appears at the end of those values without dots. The output is:

5632      81101
8060      41200
15147     43120
24644     68100
29144     86909
37122      6811
39853      4311
59268      4311
108633    70220
108693    56102
175820    43320
184606    41200

Do you know why?

  • Related