I have a dataframe like this
Index | Identifier |
---|---|
0 | 10769289.0 |
1 | 1082471174.0 |
The "Identifier column is a string column" and I need to remove the ".0"
I'm using the following code:
Dataframe["Identifier"] = Dataframe["Identifier"].replace(regex=['.0'],value='')
But I got this:
IndexIdentifier0769289182471174
As you can see it removed more than just the ".0". I also tried to use
Dataframe["Identifier"] = Dataframe["Identifier"].str.replace(".0", "")
but I got the same result.
CodePudding user response:
The dot (.
) in regex or in replace can indicate any
character. Therefore you have to escape the decimal point. Otherwise it will replace any character followed by a zero. Which in your case would mean that it would replace the 10
at the beginning of 10769289.0
and 1082471174.0
, as well as the .0
at the end of each number. By escaping the decimal point, it will only look for the following: .0
- which is what you intended.
import pandas as pd
# Create the dataframe as per the example
Dataframe = pd.DataFrame({"Index": [0,1], "Identifier": ['10769289.0', '1082471174.0']})
# Replace the decimal and the zero at the end of each Identifier.
Dataframe["Identifier"] = Dataframe["Identifier"].str.replace("\.0", "")
# Print the dataframe
print(Dataframe)
OUTPUT:
Index Identifier
0 0 10769289
1 1 1082471174