I have been trying to work on this dataset that includes quantities for two types of sales (0,1) for different counties across different dates. Some dates, however, include both type 1 and type 0 sales. How can I merge type 1 and 0 sales for the same date and same id? The dataset has over 40k rows and I have no idea where to start. I was thinking about creating an if loop but I have no idea how to write it. It can be in python or R.
Essentially, I have a table that looks like this:
Date | City | Type | Quantity |
---|---|---|---|
2020-01-01 | Rio | 1 | 10 |
2020-01-01 | Rio | 0 | 16 |
2020-03-01 | Rio | 0 | 23 |
2020-03-01 | Rio | 1 | 27 |
2020-05-01 | Rio | 1 | 29 |
2020-08-01 | Rio | 0 | 36 |
2020-01-01 | Sao Paulo | 0 | 50 |
2020-01-01 | Sao Paulo | 1 | 62 |
2020-03-01 | Sao Paulo | 0 | 30 |
2020-04-01 | Sao Paulo | 1 | 32 |
2020-05-01 | Sao Paulo | 0 | 65 |
2020-05-01 | Sao Paulo | 1 | 155 |
I want to combine, for example, Rio's quantities for both type 1 and 0 on 2020-01-01, as well as 2020-03-01, and the same thing for Sao Paulo and all subsequent counties. I want to aggregate types 1 and 0 quantities but still preserve the date and city columns.
CodePudding user response:
You can use the pandas groupby
and agg
functions to do this operation. Here is some example code:
import pandas as pd
df = pd.DataFrame({'date': ['3/10/2000', '3/11/2000', '3/12/2000', '3/10/2000'],
'id':[0,1,0,0], 'sale_type':[0,0,0,1], 'amount': [2, 3, 4, 2]})
df['date'] = pd.to_datetime(df['date'])
df.groupby(['date', 'id']).agg({'amount':sum})
>>> amount
date id
2000-03-10 0 4
2000-03-11 1 3
2000-03-12 0 4
CodePudding user response:
My version of code:
# -*- coding: utf-8 -*-
import pandas as pd
# generating a sample dataframe
df = pd.DataFrame([['10-01-2020', 311100, 'ABADIA', 'MG', 'MINAS', 'IVERMECTIONA', 0, 68],
['10-01-2020', 311100, 'ABADIA', 'MG', 'MINAS', 'IVERMECTIONA', 1, 120]],
columns=['date', 'code1', 'code2', 'code3', 'code4', 'code5', 'type_of_sales', 'count_sales'])
# printing content of dataframe
print(df)
# using group by operation over columns we want to see in resultset and aggregating additive columns
df = df.groupby(['date', 'code1', 'code2', 'code3', 'code4', 'code5']).agg({'count_sales': ['sum']})
# aligning levels of column headers
df = df.droplevel(axis=1, level=0).reset_index()
# renaming column name to previous after aggregating
df = df.rename(columns={'sum':'count_sales'})
print(df)
CodePudding user response:
Try something like this:
import pandas as pd
df = pd.read_csv('your_file_name.csv')
df.pivot_table(values='Sales', index=['Date', 'City'], aggfunc='sum')