My data looks smth like:
Index | Job | Y | Balance |
---|---|---|---|
1 | A | Yes | 1 |
2 | B | No | 2 |
3 | A | No | 5 |
4 | A | No | 0 |
5 | B | Yes | 4 |
I want to summarize the data in the following format, with job in the row and Y in the column:
Yes | No | |
---|---|---|
A | 1 | 2 |
B | 1 | 1 |
I have tried the following code:
pivot = df.pivot_table(index =['job'], columns = ['y'], values = ['balance'], aggfunc ='count')
I am not able to run the pivot without using balance in the value parameter. How do I get the above result?
CodePudding user response:
You can try this:
data = {'Index': [1, 2, 3, 4, 5],
'job': ['A', 'B', 'A', 'A', 'B'],
'y': ['Yes', 'No', 'No', 'No', 'Yes'],
'Balance': [1, 2, 5, 0, 4]}
df = pd.DataFrame(data)
pivot = df.groupby(['job', 'y']).size().unstack(fill_value=0)
print(pivot)
CodePudding user response:
To be able to do this, you will need to do df.groupby()
first, to group the data on Job and Y columns to get the count of yes/no using the below code:
df2 = df.groupby(['Job', 'Y'], as_index=False).count()
Job Y Balance
0 A No 2
1 A Yes 1
2 B No 1
3 B Yes 1
You can then use df2.pivot()
to pivot this grouped table into the desired format:
df2.pivot(index='Job', columns='Y', values='Balance')
Y No Yes
Job
A 2 1
B 1 1