I have a dataset with that has the two biggest cities in each country in Europe. I want to make a column as a list of the two cities put together. below is an example of how the dataset looks.
Country BiggestCity SecondBiggestCity
England London Birmingham
Spain Madrid Barcelona
Sweden Stockholm Gothenburg
Germany Berlin Frankfurt
This is what I've tried:
df['BothCities'] = df['BiggestCity'] df['SecondBiggestCity']
but it doesnt give me the desired output. I would like to add an additional column BothCities
which would have the two cities in a list.
Desired Output:
Country BiggestCity SecondBiggestCity BothCities
England London Birmingham [London, Birmingham]
Spain Madrid Barcelona [Madrid, Barcelona]
Sweden Stockholm Gothenburg [Stockholm, Gothenburg]
Germany Berlin Frankfurt [Berlin, Frankfurt]
Any recommendations on how to do that?
CodePudding user response:
Try this:
df['BothCities'] = df[['BiggestCity', 'SecondBiggestCity']].values.tolist()
print(df)
Country BiggestCity SecondBiggestCity BothCities
0 England London Birmingham [London, Birmingham]
1 Spain Madrid Barcelona [Madrid, Barcelona]
2 Sweden Stockholm Gothenburg [Stockholm, Gothenburg]
3 Germany Berlin Frankfurt [Berlin, Frankfurt]
No need for it but if you wanted to solve it your way you could achieve it like this:
What you have tried is only part of your goal. You just concatenate two strings to one big string. You would concatenate them with a separator (e.g '/'
)
df['BothCities'] = df['BiggestCity'] '/' df['SecondBiggestCity']
and then split it to a list based on that separator
df['BothCities'] = df['BothCities'].str.split('/')
CodePudding user response:
You need concat with list
df['BothCities'] = df['BiggestCity'].apply(lambda x: [x]) df['SecondBiggestCity'].apply(lambda x: [x])
Or you can try apply
list
on rows
df['BothCities'] = df[['BiggestCity', 'SecondBiggestCity']].apply(list, axis=1)
print(df)
Country BiggestCity SecondBiggestCity BothCities
0 England London Birmingham [London, Birmingham]
1 Spain Madrid Barcelona [Madrid, Barcelona]
2 Sweden Stockholm Gothenburg [Stockholm, Gothenburg]
3 Germany Berlin Frankfurt [Berlin, Frankfurt]
You can also try with zip
df['BothCities'] = list(map(list, zip(df['BiggestCity'], df['SecondBiggestCity'])))
CodePudding user response:
If you want it simple, you can do a simple list comprehension with zip.
df["BothCities"] = [[i,j]for i,j in zip(df['BiggestCity'],df['SecondBiggestCity'])]
Or tuple for faster performance
df["BothCities"] = tuple([i,j]for i,j in zip(df['BiggestCity'],df['SecondBiggestCity']))
For your reference (with 1000 rows of data)
%timeit [[i,j]for i,j in zip(df['BiggestCity'],df['SecondBiggestCity'])]
304 µs ± 9.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit tuple([i,j]for i,j in zip(df['BiggestCity'],df['SecondBiggestCity']))
298 µs ± 6.66 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)