df.groupby('arrival_date_year').market_segment.value_counts()
I used the above code to get the following results:
arrival_date_year market_segment
2015 Online TA 6165
Groups 6100
Offline TA/TO 6079
Direct 2314
Corporate 1171
Complementary 165
Undefined 2
2016 Online TA 27661
Offline TA/TO 12473
Groups 7857
Direct 5663
Corporate 2562
Complementary 364
Aviation 127
2017 Online TA 22651
Groups 5854
Offline TA/TO 5667
Direct 4629
Corporate 1562
Complementary 214
Aviation 110
What is the correct syntax to plot multiple lines representing the various market segments with x axis=year
and y axis=value count
?
Alternatively, how do I get the data in wide format so that it's easier to compare the counts side by side?
CodePudding user response:
To compare the counts side by side how about
df.groupby('arrival_date_year').market_segment.value_counts().unstack().reset_index()
e.g.
df
Out[257]:
X Y Z
0 a 10 100
1 b 20 200
2 a 30 300
3 b 10 400
4 a 20 100
5 b 30 200
6 a 10 300
7 b 20 400
8 a 30 100
9 b 10 200
10 a 20 300
11 b 30 400
df.groupby('X').Y.value_counts().unstack().reset_index()
Out[258]:
Y X 10 20 30
0 a 2 2 2
1 b 2 2 2