Here is my dataframe
ID | Sum total | Sum partial |
---|---|---|
A1 | 40 | 25 |
A2 | 70 | 50 |
A3 | 100 | 40 |
I would like to build a third column based on the following pivot table given grid let's say:
Sum total interval\sum partial interval | 0-30 | 30-55 | 55-70 |
---|---|---|---|
0-50 | 0.10 | 0.17 | 0.22 |
50-75 | 0.14 | 0.18 | 0.25 |
75-100 | 0.20 | 0.27 | 0.38 |
Which would give this expected result:
ID | Sum total | Sum partial | Ratio given by grid |
---|---|---|---|
A1 | 40 | 25 | 0.10 |
A2 | 70 | 50 | 0.18 |
A3 | 100 | 40 | 0.27 |
I would like to know what is the most convenient way to do this?
Thank you,
CodePudding user response:
Try this using pd.IntervalIndex
and loc
:
import pandas as pd
df = pd.read_clipboard() # Copy input from question above
df1 = pd.read_clipboard() # Copy input from question above
df1 = df1.set_index(df1.columns[0]) #Create index with first column
#Create IntervalIndex from ranges in index and column headers
df1.columns = pd.IntervalIndex.from_arrays([0,30,55],[30,55,70])
df1.index = pd.IntervalIndex.from_arrays([0,50,75],[50,75,100])
#Test getting value by stacking columns and index to create multiIndex
df1.stack().loc[(40,25)] #returns .1
# Create list of tuples for "lookup" in df1 with zip
df['Ratio Give by grid'] = df1.stack().loc[zip(df['Sum total'], df['Sum partial'])].to_numpy()
df
Output:
ID Sum total Sum partial Ratio Give by grid
0 A1 40 25 0.10
1 A2 70 50 0.18
2 A3 100 40 0.27