In a pandas frame with 4 columns, I need to remove the digits from the end of the names of the Country column that have them:
Country Energy
56 Central African Republic 23
57 Chad 77
58 Chile 1613
59 China2 127191
60 Hong Kong 585
75 Denmark5 725
CodePudding user response:
I'd write a function to remove digits at the end of the string and apply it to that specific column:
import pandas as pd
import string
def remove_digits(country):
return country.rstrip(string.digits)
df = pd.DataFrame({'country': ['China2', 'Hong Kong'], 'energy': [127191, 585]})
print(df)
df['country'] = df['country'].apply(remove_digits)
print('\n', df)
This will return:
country energy
0 China2 127191
1 Hong Kong 585
country energy
0 China 127191
1 Hong Kong 585
CodePudding user response:
Most simply, I'd use regex to replace digits at the end with an empty string:
import pandas as pd
df = pd.DataFrame({'Country': ['China2', 'Hong Kong', 'Ireland43'], 'energy': [127191, 585, 999]})
print(df)
df['Country'] = df['Country'].str.replace('[0-9]?', '', regex=True)
print('\n', df)
This returns:
Country energy
0 China2 127191
1 Hong Kong 585
2 Ireland43 999
Country energy
0 China 127191
1 Hong Kong 585
2 Ireland 999