I have a CSV file of a tea exporting company that I have imported to Jupyter and the sample output of the file is like below.
Date Type Weight Quantity Price
2016-01-01 black bags 1734136.51 1131.30
2016-01-01 black bulk 10722389.66 510.86
2016-01-01 green 4g_1kg 6817078.01 588.72
2016-01-01 instant 1kg_3kg 86444.50 565.91
2016-01-01 black 3kg_5kg 1003986.73 552.39
I'm expecting to get a output grouped by each month at the first column and then type the next column and then weight categories in the next and total of each quantity in the next column. Just like this.
YearMonth Type Weight Quantity Price
201601 black bags 17341 1131.30
1kg_3kg 107223 510.86
3kg_5kg 107223 510.86
green bulk 107223 510.86
1kg_3kg 107223 510.86
3kg_5kg 107223 510.86
instant 4g_1kg 6817078.01 588.72
3kg_5kg 107223 510.86
3kg_5kg 107223 510.86
201602 black 1kg_3kg 86444.50 565.91
This is the code I have tried so far but I don't know how to group by the type and weight classes together.
# to get the date and convert it to YEARMONTH format
data['Date'] = pd.to_datetime(data['Date']).dt.date
data['YearMonth'] = data['Date'].map(lambda date: 100*date.year date.month)
# calculate the total each month
data = data.groupby(['YearMonth'])['Quantity'].sum().reset_index()
data
It shoes me something like this
YearMonth Quantity
0 201601 23478578.08
1 201602 22988482.20
2 201603 25909619.24
3 201604 21687809.15
4 201605 20776964.81
Is there any way I can achieve this by grouping them all together?
CodePudding user response:
You can do it by the following:
1) Create your DataFrame
, I used the dummy data you provided but I changed 2 months in order to see the result:
# Create a dictionary of data for the DataFrame
data = {'Date': ['2016-01-01', '2016-01-01', '2016-02-01', '2016-02-02', '2016-01-01'],
'Type': ['black', 'black', 'green', 'instant', 'black'],
'Weight': ['bags', 'bulk', '4g_1kg', '1kg_3kg', '3kg_5kg'],
'Quantity': [1734136.51, 10722389.66, 6817078.01, 86444.50, 1003986.73],
'Price': [1131.30, 510.86, 588.72, 565.91, 552.39]}
# Create the DataFrame
df = pd.DataFrame(data)
2) Convert your DataFrame.index
to datetime
using your Date column:
df.index = pd.to_datetime(df["Date"],format='%Y-%m-%d')
3) Group and aggregate your dataframe accordingly:
df = df.groupby(by=[df.index.year, df.index.month, 'Type', 'Weight']).agg({'Quantity':'sum', 'Price':'first'})
4) Renname your index.names
for proper names:
df.index.names = ['Year', 'Month', 'Type', 'Weight']
Result will be as follows:
------ ------ ------- -------- ------------ -------
| Year| Month| Type| Weight| Quantity| Price|
------ ------ ------- -------- --------------------
| 2016| 1| black| 3kg_5kg| 1003986.73| 552.39|
| | | | bags| 1734136.51| 1131.3|
| | | | bulk| 10722389.66| 510.86|
| | 2| green| 4g_1kg| 6817078.01| 588.72|
| | |instant| 1kg_3kg| 86444.50| 565.91|
------ ------ ------ --------- ------------ -------
I see you used reset_index
so you can use it here as well to get the following result:
df.reset_index()
- ------ ------ ------- -------- ------------ -------
| | Year| Month| Type| Weight| Quantity| Price|
- ------ ------ ------- -------- --------------------
|0| 2016| 1| black| 3kg_5kg| 1003986.73| 552.39|
|1| 2016| 1| black| bags| 1734136.51| 1131.3|
|2| 2016| 1| black| bulk| 10722389.66| 510.86|
|3| 2016| 2| green| 4g_1kg| 6817078.01| 588.72|
|4| 2016| 2|instant| 1kg_3kg| 86444.50| 565.91|
- ------ ------ ------ --------- ------------ -------
Note that if you don't want to keep the old index, you can use
df.reset_index(drop=True)