I'm currently using Pandas and the dataframe is a csv file and I need to use a date column to create to additional columns:
On the date column I have filtered it with the following to get 2022/2021 information only using:
df = df.loc[df["Date"].between("2022", "2021")]
currently I have:
Date | Type | Initial | Number |
---|---|---|---|
2022 | Bin | S | 5 |
2022 | Bin | S | 6 |
2022 | Bin | S | 9 |
2021 | Bin | B | 5 |
2021 | Bin | B | 7 |
2021 | Bin | B | 0 |
I am currently trying to get the following output:
Type | 2022 | 2021 | Initial | Difference |
---|---|---|---|---|
Bin | 20 | 12 | S/B | 8 |
CodePudding user response:
If I got your question, this may help
df = df.groupby(by=['Date']).sum()
df['Difference'] = df.diff()['Number']
CodePudding user response:
here is my attempt:
dfx = df.groupby(['Type ','Date ']).agg({'Initial ':lambda x: '/'.join(x.unique()),'Number':'sum'}).reset_index()
output:
Type Date Initial Number
Bin 2021 B 12
Bin 2022 S 20
then:
dfy = dfx.pivot(index=['Type ','Initial '], columns='Date ', values='Number').reset_index()
output:
Type Initial 2021 2022
Bin B 12.0
Bin S 20.0
then:
dfz = dfy.groupby('Type ').agg({'Initial ':lambda x: '/'.join(x.unique()),2021:'sum',2022:'sum'}).reset_index()
output:
Type Initial 2021 2022
Bin B /S 12.0 20.0
now you can add difference:
dfz['difference'] = dfz[2022] - dfz[2021]
output:
Type Initial 2021 2022 difference
Bin B /S 12.0 20.0 8.0