Home > Software engineering >  how to spot delete spaces in pandas column
how to spot delete spaces in pandas column

Time:06-26

i have a dataframe with a column location which looks like this:

enter image description here

on the screenshot you see the case with 5 spaces in location column, but there are a lot more cells with 3 and 4 spaces, while the most normal case is just two spaces: between the city and the state, and between the state and the post-code.

i need to perform the str.split() on location column, but due to the different number of spaces it will not work, because if i substitute spaces with empty space or commas, i'll get different number of potential splits.

so i need to find a way to turn spaces that are inside city names into hyphens, so that i am able to do the split later, but at the same time not touch other spaces (between city and state, and between state and post code). any ideas?

CodePudding user response:

I have written those code in terms of easy understanding/readability. One way to solve above query is to split location column first into city & state, perform operation on city & merge back with state.

import pandas as pd

df = pd.DataFrame({'location':['Cape May Court House, NJ 08210','Van Buron Charter Township, MI 48111']})

df[['city','state'] ]= df['location'].str.split(",",expand=True)

df['city'] = df['city'].str.replace(" ",'_')

df['location_new'] = df['city'] ',' df['state']

df.head()

final output will look like this with required output in column location_new : enter image description here

  • Related