Dataframe: How to remove dot in a string-CodePudding

I want to use categorical features directly with CatBoost model and I need to declare my object columns as categorical in Catboost model . I have a column in my data frame which is an object containing nace codes looking like this:

NACE_code

5632      81.101
8060      41.200
15147     43.120
24644     68.100
29144     86.909
37122         68
39853         43
59268         43
108633    70.220
108693    56.102
175820    43.320
184606    41.200
Name: NACE_code, dtype: object

Python doesn't accept this column as categorical column. Instead it tells me that this is a float since some of the values have dots. I am relatively new in python and I have tried different ways to remove the dot from those values but my last attempt changes all those values without dot to NAN.

df['NACE_code'].str.replace(r"(\d)\.", r"\1")

5632      81101
8060      41200
15147     43120
24644     68100
29144     86909
37122       NaN
39853       NaN
59268       NaN
108633    70220
108693    56102
175820    43320
184606    41200
Name: NACE_KODE, dtype: object

How do I get my column to look like this? I appreciate any help I can get!

5632      81101
8060      41200
15147     43120
24644     68100
29144     86909
37122       68
39853       43
59268       43
108633    70220
108693    56102
175820    43320
184606    41200

CodePudding user response：

Use astype('str') to convert columns to string type before calling str.replace.

Without regex:

df['NACE_code'].astype('str').str.replace(r".", r"", regex=False)

CodePudding user response：

# The following code should work:
df.NACE_code = df.NACE_code.astype(str)
df.NACE_code = df.NACE_code.str.replace('.', '')

CodePudding user response：

Thanks for the response Aakash Dusane and gajendragarg!

When I run either of these, new digits appears at the end of those values without dots. The output is:

5632      81101
8060      41200
15147     43120
24644     68100
29144     86909
37122      6811
39853      4311
59268      4311
108633    70220
108693    56102
175820    43320
184606    41200

Do you know why?