During data entry, Some of the States were added to the same cell as the Address line. The city and state vary and are generally unknown. There are also some cases of a ,
that would need to be removed.
AddrLines AddrCity AddrState
0 123 street Titusville FL NaN
1 456 road Viera FL NaN
2 789 place Melbourne, Fl NaN
3 001 ave Wright VA
My goal is to clean up the City column while moving the state over to the State column. Something like this, but removing the state and the ,
at the same time.
df.loc[(df['AddrCity'].str.contains(' Fl')),'AddrState'] = 'FL'
df.loc[(df['AddrCity'].str.contains(' FL')),'AddrState'] = 'FL'
df.loc[(df['AddrCity'].str.contains(', Fl')),'AddrState'] = 'FL'
CodePudding user response:
I was able to get this by performing the following
df1[['AddrCity', 'State_Holder']] = df1['AddrCity'].str.replace(', ', ' ').str.replace(' ', ', ').str.split(', ', 1, expand = True)
df1['AddrState'] = np.where(df1['AddrState'].isna(), df1['State_Holder'], df1['AddrState'])
df1.drop(columns = ['State_Holder'])