Home > Software design >  Unable to remove special character using regular experssion
Unable to remove special character using regular experssion

Time:05-20

In my data there is a column kilometer whose values is showing as "29261..". Using regular expression i need to remove the double dot (..). I tried the below code however i could not get the solution. Below is the code for your reference:

df=[{
    "UNIQUESERIALNO":"abcd123",
    "Kilometer":"29261.."    
}]

df=pd.DataFrame.from_dict(df)
df['Kilometer'].replace(regex=True, inplace=True, to_replace=r'[^0-9.\*.\*]', value=r'')
print(df)

CodePudding user response:

You're almost there. Try this:

import pandas as pd

df=[{
    "UNIQUESERIALNO":"abcd123",
    "Kilometer":"29261.."    
}]

df=pd.DataFrame.from_dict(df)
df['Kilometer'].replace(regex=True, inplace=True, to_replace=r'\.{2}', value=r'')
print(df)

Output:

  UNIQUESERIALNO Kilometer
0        abcd123     29261

CodePudding user response:

Using str.replace and you need to edit the specific column

import pandas as pd

df=[{
    "UNIQUESERIALNO":"abcd123",
    "Kilometer":"29261.."    
}]

df=pd.DataFrame.from_dict(df)
df['Kilometer'] = df.Kilometer.str.replace('.', '')

print(df)

CodePudding user response:

If you want to replace those two periods in that specific context (at the end of the line after a decimal digit), you can specify the look-behind regex:

df['Kilometer'].replace('(?<=\d)\.\.$', # (?<=\d) means "after a digit"
                        '', regex=True)
  • Related