Home > Back-end >  Substring a column in pandas
Substring a column in pandas

Time:11-10

I have a dataframe like this

Index Identifier
0 10769289.0
1 1082471174.0

The "Identifier column is a string column" and I need to remove the ".0"

I'm using the following code:

Dataframe["Identifier"] = Dataframe["Identifier"].replace(regex=['.0'],value='')

But I got this:

IndexIdentifier0769289182471174

As you can see it removed more than just the ".0". I also tried to use

Dataframe["Identifier"] = Dataframe["Identifier"].str.replace(".0", "")

but I got the same result.

CodePudding user response:

The dot (.) in regex or in replace can indicate any character. Therefore you have to escape the decimal point. Otherwise it will replace any character followed by a zero. Which in your case would mean that it would replace the 10 at the beginning of 10769289.0 and 1082471174.0, as well as the .0 at the end of each number. By escaping the decimal point, it will only look for the following: .0 - which is what you intended.

import pandas as pd

# Create the dataframe as per the example
Dataframe = pd.DataFrame({"Index": [0,1], "Identifier": ['10769289.0', '1082471174.0']})

# Replace the decimal and the zero at the end of each Identifier.
Dataframe["Identifier"] = Dataframe["Identifier"].str.replace("\.0", "")

# Print the dataframe
print(Dataframe)

OUTPUT:

   Index  Identifier
0      0    10769289
1      1  1082471174
  • Related