Let's say I have a dataframe
df = {
'Date': ['1-8','1-8','1-8','2-8','2-8','2-8'],
'Value1': [9,12,7,8,15,16],
'Value2': [1,3,2,4,2,3]}
I want to sum Value2 in 2 groups, based on Value1(1-10,11-20) and date and create a new dataframe with 3 columns Date,1-10,11-20.
I also want the range be extendable like 1-10,11-20,21-30.....>100 so i am thinking of something like
for i in range(0, 100, 10)
but no idea how to put the date condition
so that the final result is like:
new_df = {
'Date': ['1-8','2-8'],
'1-10': [3,4],
'11-12':[3,5]}
CodePudding user response:
The following code should do what you expect:
import pandas as pd
# create a new dataframe from the original data
df = pd.DataFrame({
'Date': ['1-8','1-8','1-8','2-8','2-8','2-8'],
'Value1': [9,12,7,8,15,16],
'Value2': [1,3,2,4,2,3]
})
# create a range of values from 1 to 101, in increments of 10
value_range = range(1, 110, 10)
# create labels for the value range in the format '11-20'
labels = [f"{lower}-{upper-1}" for lower, upper in zip(value_range[:-1], value_range[1:])]
# create a new column for the value range
df['Value Range'] = pd.cut(df['Value1'], value_range, right=False, labels=labels)
# group the data by date and value range
grouped_df = df.groupby(['Date', 'Value Range'])['Value2'].sum().reset_index()
# pivot the data to create the final dataframe
new_df = grouped_df.pivot(index='Date', columns='Value Range', values='Value2')
print(new_df)
This code delivers code ranges up to 91-100
. In order to only get relavant ranges you could modify the code like this:
import math
import pandas as pd
# create a new dataframe from the original data
df = pd.DataFrame({
'Date': ['1-8','1-8','1-8', '2-8','2-8','2-8'],
'Value1': [19,30,0, 10,15,16],
'Value2': [1,3,2, 4,2,3]
})
# get the minimum and maximum values in Value1
lower_bound = df.Value1.min()
upper_bound = df.Value1.max()
# check if the lower bound is a multiple of 10
if lower_bound % 10 == 0:
# if it is, subtract 10 from the lower bound
lower_bound -= 10
# round down the lower bound to the nearest multiple of 10
lower_bound = math.floor(lower_bound/10)*10
# create a range of values from the lower bound to the upper bound, in increments of 10
value_range = range(lower_bound 1, upper_bound 10, 10)
# create labels for the value range in the format '11-20'
labels = [f"{lower}-{upper-1}" for lower, upper in zip(value_range[:-1], value_range[1:])]
# create a new column for the value range
df['Value Range'] = pd.cut(df['Value1'], value_range, right=False, labels=labels)
# group the data by date and value range
grouped_df = df.groupby(['Date', 'Value Range'])['Value2'].sum().reset_index()
# pivot the data to create the final dataframe
new_df = grouped_df.pivot(index='Date', columns='Value Range', values='Value2')
print(new_df)