I have a dataset and I am trying to split the values of a column location. The dataset I have is:- Dataset I have
The dataset have 56 null values so I get the indexes of those null values using below code:-
nan = []
for i in range(len(data['location'])):
if type(data['location'][i]) == float:
nan.append(i)
Once done I ran an another loop:-
for i in range(len(data['location'])):
if i in nan:
data['city'] = np.nan
else:
data['city'] = data['location'][i].split(',')[1]
This giving me an error saying,
IndexError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_15176/2022247788.py in <module>
3 data['city'] = np.nan
4 else:
----> 5 data['city'] = data['location'][i].split(',')[1]
IndexError: list index out of range
Though it is giving me values but it's not giving correct values. As seen in the location first value is NaN so I want NaN in the city and the second value in the column is Canandaigua, NY so I wish to have NY in the city.
I also tried them to split directly using the below code:-
data[['town','city2']] = data['location'].str.split(',',expand=True)
but getting an error:-
ValueError: Columns must be the same length as key
CodePudding user response:
You can do something like this to get the city to another column
data['city'] = data.location.str.split(",").str[1]
This will return either the city, or NaN if not available
Edit: Try this then.
data['city'] = data[~data.location.isna()].location.str.split(",").apply(lambda x: x[0] if len(x) == 1 else x[1])
This checks if length of the split string is just one then returns the string as is. If else, returns the second value.
CodePudding user response:
This should work too
### Comma Condition
comma_condtn = (df['location'].str.contains(',')) & (df['location'].notna())
### Extract city
df.loc[comma_condtn, 'city_2'] = df['location'].apply(lambda x : str(x).split(',').pop())
### Condition without commas
df.loc[df['city_2'].isna(), 'city_2'] = df['location']