I have a dataset with data of a set of bands who play in the same two-city festival. I aim to create a pandas dataframe of two colums, Band and total attendance. but I am unable to use the df.groupby argument as the same bands are mentioned in two different columns. Any tips? Below is an example of how the dataset looks.
F.ex
Time Band_in_London Band_in_Reading Attendance_London Attendance_Reading
12 Kasabian Queen 3000 4000
13 Foo Fighters Beatles 1000 5000
14 U2 A-ha 2500 2500
15 Queen Kasabian 2200 1000
16 Beatles Foo Fighters 1300 4700
17 A-Ha U2 4000 3100
18
CodePudding user response:
Using pd.wide_to_long
:
import pandas as pd
# Notice there is a typo for the A-ha.
df = df.replace({'A-ha':'A-Ha'})
attendance = (pd.wide_to_long(df, ['Attendance', 'Band_in'], 'Time', 'City', sep='_', suffix=r'\w ').
groupby('Band_in').sum()
)
print(attendance)
Attendance
Band_in
A-Ha 6500
Beatles 6300
Foo Fighters 5700
Kasabian 4000
Queen 6200
U2 5600
CodePudding user response:
df = original_dataframe[['Band_in_London', 'Attendance_London']]
df = df.rename({'Band_in_London': 'Band', 'Attendance_London': 'Attendance'}, axis=1)
df = df.append(
original_dataframe[['Band_in_Reading', 'Attendance_Reading']].rename(
{'Band_in_Reading': 'Band', 'Attendance_Reading': 'Attendance'}, axis=1))
df = df.groupby(['Band']).sum()