Sample Data frame -
df = pd.DataFrame({'City':['York New', 'Parague', 'New Delhi', 'Venice', 'new Orleans'],
'Event':['Music', 'Poetry', 'Theatre', 'Comedy', 'Tech_Summit'],
'Cost':[10000, 5000, 15000, 2000, 12000]})
index_ = [pd.Period('02-2018'), pd.Period('04-2018'),
pd.Period('06-2018'), pd.Period('10-2018'), pd.Period('12-2018')]
df.index = index_
print(df)
Problem Statement - For those cities which starts with the keyword ‘New’ or ‘new’, change it to ‘New_’
First I have created a new column in the dataframe to find if the City has "new" in the name and if yes then at what position
df["pos"] = df["City"].apply(lambda x: x.lower().find("new"))
Then I have created a function to replace "New" or "new" by "New_" if they are present in the starting of the city name -
def replace_new(city,pos):
if pos==0:
return city.replace("[Nn]ew", "New_", regex = True)
else:
return city
df = df[["City","pos"]].apply(replace_new, axis = 1)
When I execute the above code line I am getting this error -
"("replace_new() missing 1 required positional argument: 'pos'", 'occurred at index 2018-02')"
What am I doing wrong here? Please help
CodePudding user response:
Use str.replace
with a regex:
df['City'] = df['City'].str.replace(r'^new\s*', 'New_', case=False, regex=True)
output:
City Event Cost
2018-02 York New Music 10000
2018-04 Parague Poetry 5000
2018-06 New_Delhi Theatre 15000
2018-10 Venice Comedy 2000
2018-12 New_Orleans Tech_Summit 12000
regex:
^ # match start of line
new # match "new"
\s* # match zero or more spaces
CodePudding user response:
Call the method this way
df = df.apply(lambda x: replace_new(x.City, x.pos), axis=1)
Also the replace method is not working properly use Re
import re
def replace_new(city,pos):
if pos==0:
return re.sub('[Nn]ew', 'New_', city)
else:
return city