Home > Software engineering >  Unable to split two values of a column
Unable to split two values of a column

Time:06-05

I have a dataset and I am trying to split the values of a column location. The dataset I have is:- Dataset I have

The dataset have 56 null values so I get the indexes of those null values using below code:-

nan = []
for i in range(len(data['location'])):
    if type(data['location'][i]) == float:
        nan.append(i)

Once done I ran an another loop:-

for i in range(len(data['location'])):
    if i in nan:
        data['city'] = np.nan
    else:
        data['city'] = data['location'][i].split(',')[1]

This giving me an error saying,

IndexError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_15176/2022247788.py in <module>
      3         data['city'] = np.nan
      4     else:
----> 5         data['city'] = data['location'][i].split(',')[1]

IndexError: list index out of range

Though it is giving me values but it's not giving correct values. As seen in the location first value is NaN so I want NaN in the city and the second value in the column is Canandaigua, NY so I wish to have NY in the city.

I also tried them to split directly using the below code:-

data[['town','city2']] = data['location'].str.split(',',expand=True)

but getting an error:-

ValueError: Columns must be the same length as key

CodePudding user response:

You can do something like this to get the city to another column

data['city'] = data.location.str.split(",").str[1]

This will return either the city, or NaN if not available

Edit: Try this then.

data['city'] = data[~data.location.isna()].location.str.split(",").apply(lambda x: x[0] if len(x) == 1 else x[1])

This checks if length of the split string is just one then returns the string as is. If else, returns the second value.

CodePudding user response:

This should work too

### Comma Condition
comma_condtn = (df['location'].str.contains(',')) & (df['location'].notna())

### Extract city
df.loc[comma_condtn, 'city_2'] = df['location'].apply(lambda x : str(x).split(',').pop())

### Condition without commas
df.loc[df['city_2'].isna(), 'city_2'] = df['location']
  • Related