Home > OS >  SUM all data in the same date and group data in certain range
SUM all data in the same date and group data in certain range

Time:12-29

Let's say I have a dataframe

df = {
'Date': ['1-8','1-8','1-8','2-8','2-8','2-8'],
'Value1': [9,12,7,8,15,16],
'Value2': [1,3,2,4,2,3]}

I want to sum Value2 in 2 groups, based on Value1(1-10,11-20) and date and create a new dataframe with 3 columns Date,1-10,11-20.

I also want the range be extendable like 1-10,11-20,21-30.....>100 so i am thinking of something like

for i in range(0, 100, 10)

but no idea how to put the date condition

so that the final result is like:

new_df = {
'Date': ['1-8','2-8'],
'1-10': [3,4],
'11-12':[3,5]}

CodePudding user response:

The following code should do what you expect:

import pandas as pd

# create a new dataframe from the original data
df = pd.DataFrame({
    'Date': ['1-8','1-8','1-8','2-8','2-8','2-8'],
    'Value1': [9,12,7,8,15,16],
    'Value2': [1,3,2,4,2,3]
})

# create a range of values from 1 to 101, in increments of 10
value_range = range(1, 110, 10)

# create labels for the value range in the format '11-20'
labels = [f"{lower}-{upper-1}" for lower, upper in zip(value_range[:-1], value_range[1:])]

# create a new column for the value range
df['Value Range'] = pd.cut(df['Value1'], value_range, right=False, labels=labels)

# group the data by date and value range
grouped_df = df.groupby(['Date', 'Value Range'])['Value2'].sum().reset_index()

# pivot the data to create the final dataframe
new_df = grouped_df.pivot(index='Date', columns='Value Range', values='Value2')

print(new_df)

This code delivers code ranges up to 91-100. In order to only get relavant ranges you could modify the code like this:

import math
import pandas as pd

# create a new dataframe from the original data
df = pd.DataFrame({
    'Date': ['1-8','1-8','1-8', '2-8','2-8','2-8'],
    'Value1': [19,30,0, 10,15,16],
    'Value2': [1,3,2, 4,2,3]
})

# get the minimum and maximum values in Value1
lower_bound = df.Value1.min()
upper_bound = df.Value1.max()

# check if the lower bound is a multiple of 10
if lower_bound % 10 == 0:
    # if it is, subtract 10 from the lower bound
    lower_bound -= 10

# round down the lower bound to the nearest multiple of 10
lower_bound = math.floor(lower_bound/10)*10

# create a range of values from the lower bound to the upper bound, in increments of 10
value_range = range(lower_bound 1, upper_bound 10, 10)

# create labels for the value range in the format '11-20'
labels = [f"{lower}-{upper-1}" for lower, upper in zip(value_range[:-1], value_range[1:])]

# create a new column for the value range
df['Value Range'] = pd.cut(df['Value1'], value_range, right=False, labels=labels)

# group the data by date and value range
grouped_df = df.groupby(['Date', 'Value Range'])['Value2'].sum().reset_index()

# pivot the data to create the final dataframe
new_df = grouped_df.pivot(index='Date', columns='Value Range', values='Value2')

print(new_df)
  • Related