Home > Net >  split dataframe rows according to ratios
split dataframe rows according to ratios

Time:06-21

I want to turn this dataframe

| ID| values|
|:--|:-----:|
| 1 | 10    |
| 2 | 20    |
| 3 | 30    |

into the below one by splitting the values according to the ratios 2:3:5

| ID| values|
|:--|:-----:|
| 1 | 2     |
| 1 | 3     |
| 1 | 5     |
| 2 | 4     |
| 2 | 6     |
| 2 | 10    |
| 3 | 6     |
| 3 | 9     |
| 3 | 15    |

Is there any simple code/convenient way to do this? Thanks!

CodePudding user response:

Let us do

df['new'] = (df['values'].to_numpy()[:,None]*[2,3,5]/10).tolist()
df = df.explode('new')
Out[849]: 
   ID  values   new
0   1      10   2.0
0   1      10   3.0
0   1      10   5.0
1   2      20   4.0
1   2      20   6.0
1   2      20  10.0
2   3      30   6.0
2   3      30   9.0
2   3      30  15.0

CodePudding user response:

Here is one approach:

import pandas as pd

df = pd.DataFrame({
    "ID": [1, 2, 3],
    "values": [10, 20, 30]
})

ratios = [2, 3, 5]

df = (
    df.assign(values=df["values"].apply(lambda x: [x * (ratio / sum(ratios)) for ratio in ratios]))
    .explode("values")
)

print(df)

In essence, we aim to create cells with lists under the "values" column so that we can take advantage of a DataFrame's explode method which melts cells containing lists into individual cells.

To make these lists we use the apply method on the "values" Series (pandas term for column of a DataFrame). This function:

lambda x: [x * (ratio / sum(ratios)) for ratio in ratios]

is an anonymous function that receives a number and splits out a list of the ratios. For example, when x is 10:

10 * (2 / 10) = 2

10 * (3 / 10) = 3

10 & (5 / 10) = 5

Therefore [2, 3, 5]

Then for the next value:

20 * (2 / 10) = 4

20 * (3 / 10) = 6

20 * (5 / 10) = 10

Therefore [4, 6, 10]

etc., which results in the intermediate dataframe:

   ID            values
0   1   [2.0, 3.0, 5.0]
1   2  [4.0, 6.0, 10.0]
2   3  [6.0, 9.0, 15.0]

Using the explode method on this dataframe produces your desired result.

CodePudding user response:

here is one way to do it

ratio = [2, 3, 5]
ratio_dec = np.divide(ratio, sum(ratio))
df['ratio'] = df['values'].apply(lambda x: np.round(np.multiply(x, ratio_dec) ,0))
df.explode('ratio')
    ID  values  ratio
0   1   10  2.0
0   1   10  3.0
0   1   10  5.0
1   2   20  4.0
1   2   20  6.0
1   2   20  10.0
2   3   30  6.0
2   3   30  9.0
2   3   30  15.0

CodePudding user response:

Here's a way:

ratio = [2, 3, 5]
df = df.assign(**{f'ratio_{i}':df['values'] * x / sum(ratio) 
    for i, x in enumerate(ratio)}).set_index(['ID', 'values']).stack().to_frame(
    'values').reset_index(level=0).reset_index(drop=True)

Output:

   ID  values
0   1     2.0
1   1     3.0
2   1     5.0
3   2     4.0
4   2     6.0
5   2    10.0
6   3     6.0
7   3     9.0
8   3    15.0
  • Related