Aggregating two values in a panel data with the same date and id but different type-CodePudding

I have been trying to work on this dataset that includes quantities for two types of sales (0,1) for different counties across different dates. Some dates, however, include both type 1 and type 0 sales. How can I merge type 1 and 0 sales for the same date and same id? The dataset has over 40k rows and I have no idea where to start. I was thinking about creating an if loop but I have no idea how to write it. It can be in python or R.

Essentially, I have a table that looks like this:

Date	City	Type	Quantity
2020-01-01	Rio	1	10
2020-01-01	Rio	0	16
2020-03-01	Rio	0	23
2020-03-01	Rio	1	27
2020-05-01	Rio	1	29
2020-08-01	Rio	0	36
2020-01-01	Sao Paulo	0	50
2020-01-01	Sao Paulo	1	62
2020-03-01	Sao Paulo	0	30
2020-04-01	Sao Paulo	1	32
2020-05-01	Sao Paulo	0	65
2020-05-01	Sao Paulo	1	155

I want to combine, for example, Rio's quantities for both type 1 and 0 on 2020-01-01, as well as 2020-03-01, and the same thing for Sao Paulo and all subsequent counties. I want to aggregate types 1 and 0 quantities but still preserve the date and city columns.

CodePudding user response：

You can use the pandas groupby and agg functions to do this operation. Here is some example code:

import pandas as pd 
df = pd.DataFrame({'date': ['3/10/2000', '3/11/2000', '3/12/2000', '3/10/2000'],
                    'id':[0,1,0,0], 'sale_type':[0,0,0,1], 'amount': [2, 3, 4, 2]})
df['date'] = pd.to_datetime(df['date'])
df.groupby(['date', 'id']).agg({'amount':sum})
>>> amount
date       id        
2000-03-10 0        4
2000-03-11 1        3
2000-03-12 0        4

CodePudding user response：

My version of code:

# -*- coding: utf-8 -*-
import pandas as pd

# generating a sample dataframe
df = pd.DataFrame([['10-01-2020', 311100, 'ABADIA', 'MG', 'MINAS', 'IVERMECTIONA', 0, 68],
                  ['10-01-2020', 311100, 'ABADIA', 'MG', 'MINAS', 'IVERMECTIONA', 1, 120]],
                  columns=['date', 'code1', 'code2', 'code3', 'code4', 'code5', 'type_of_sales', 'count_sales'])

# printing content of dataframe
print(df)

# using group by operation over columns we want to see in resultset and aggregating additive columns
df = df.groupby(['date', 'code1', 'code2', 'code3', 'code4', 'code5']).agg({'count_sales': ['sum']})

# aligning levels of column headers
df = df.droplevel(axis=1, level=0).reset_index()

# renaming column name to previous after aggregating
df = df.rename(columns={'sum':'count_sales'})

print(df)

CodePudding user response：

Try something like this:

import pandas as pd

df = pd.read_csv('your_file_name.csv')
df.pivot_table(values='Sales', index=['Date', 'City'], aggfunc='sum')