Add a list column-CodePudding

I have a dataset with that has the two biggest cities in each country in Europe. I want to make a column as a list of the two cities put together. below is an example of how the dataset looks.

Country  BiggestCity   SecondBiggestCity
England  London        Birmingham
Spain    Madrid        Barcelona
Sweden   Stockholm     Gothenburg
Germany  Berlin        Frankfurt

This is what I've tried:

df['BothCities'] = df['BiggestCity']   df['SecondBiggestCity']

but it doesnt give me the desired output. I would like to add an additional column BothCities which would have the two cities in a list.

Desired Output:

Country  BiggestCity   SecondBiggestCity   BothCities
England  London        Birmingham          [London, Birmingham]
Spain    Madrid        Barcelona           [Madrid, Barcelona]
Sweden   Stockholm     Gothenburg          [Stockholm, Gothenburg]
Germany  Berlin        Frankfurt           [Berlin, Frankfurt]

Any recommendations on how to do that?

CodePudding user response：

Try this:

df['BothCities'] = df[['BiggestCity', 'SecondBiggestCity']].values.tolist()

print(df)

   Country BiggestCity SecondBiggestCity               BothCities
0  England      London        Birmingham     [London, Birmingham]
1    Spain      Madrid         Barcelona      [Madrid, Barcelona]
2   Sweden   Stockholm        Gothenburg  [Stockholm, Gothenburg]
3  Germany      Berlin         Frankfurt      [Berlin, Frankfurt]

No need for it but if you wanted to solve it your way you could achieve it like this:

What you have tried is only part of your goal. You just concatenate two strings to one big string. You would concatenate them with a separator (e.g '/')

df['BothCities'] = df['BiggestCity']   '/'   df['SecondBiggestCity']

and then split it to a list based on that separator

df['BothCities'] = df['BothCities'].str.split('/')

CodePudding user response：

You need concat with list

df['BothCities'] = df['BiggestCity'].apply(lambda x: [x])   df['SecondBiggestCity'].apply(lambda x: [x])

Or you can try apply list on rows

df['BothCities'] = df[['BiggestCity', 'SecondBiggestCity']].apply(list, axis=1)

print(df)

   Country BiggestCity SecondBiggestCity               BothCities
0  England      London        Birmingham     [London, Birmingham]
1    Spain      Madrid         Barcelona      [Madrid, Barcelona]
2   Sweden   Stockholm        Gothenburg  [Stockholm, Gothenburg]
3  Germany      Berlin         Frankfurt      [Berlin, Frankfurt]

You can also try with zip

df['BothCities'] = list(map(list, zip(df['BiggestCity'], df['SecondBiggestCity'])))

CodePudding user response：

If you want it simple, you can do a simple list comprehension with zip.

df["BothCities"] = [[i,j]for i,j in zip(df['BiggestCity'],df['SecondBiggestCity'])]

Or tuple for faster performance

df["BothCities"] = tuple([i,j]for i,j in zip(df['BiggestCity'],df['SecondBiggestCity']))

For your reference (with 1000 rows of data)

%timeit [[i,j]for i,j in zip(df['BiggestCity'],df['SecondBiggestCity'])]
304 µs ± 9.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit tuple([i,j]for i,j in zip(df['BiggestCity'],df['SecondBiggestCity']))
298 µs ± 6.66 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)