I want to turn this dataframe
| ID| values|
|:--|:-----:|
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
into the below one by splitting the values according to the ratios 2:3:5
| ID| values|
|:--|:-----:|
| 1 | 2 |
| 1 | 3 |
| 1 | 5 |
| 2 | 4 |
| 2 | 6 |
| 2 | 10 |
| 3 | 6 |
| 3 | 9 |
| 3 | 15 |
Is there any simple code/convenient way to do this? Thanks!
CodePudding user response:
Let us do
df['new'] = (df['values'].to_numpy()[:,None]*[2,3,5]/10).tolist()
df = df.explode('new')
Out[849]:
ID values new
0 1 10 2.0
0 1 10 3.0
0 1 10 5.0
1 2 20 4.0
1 2 20 6.0
1 2 20 10.0
2 3 30 6.0
2 3 30 9.0
2 3 30 15.0
CodePudding user response:
Here is one approach:
import pandas as pd
df = pd.DataFrame({
"ID": [1, 2, 3],
"values": [10, 20, 30]
})
ratios = [2, 3, 5]
df = (
df.assign(values=df["values"].apply(lambda x: [x * (ratio / sum(ratios)) for ratio in ratios]))
.explode("values")
)
print(df)
In essence, we aim to create cells with lists under the "values" column so that we can take advantage of a DataFrame
's explode
method which melts cells containing lists into individual cells.
To make these lists we use the apply
method on the "values" Series
(pandas term for column of a DataFrame). This function:
lambda x: [x * (ratio / sum(ratios)) for ratio in ratios]
is an anonymous function that receives a number and splits out a list of the ratios. For example, when x is 10:
10 * (2 / 10) = 2
10 * (3 / 10) = 3
10 & (5 / 10) = 5
Therefore [2, 3, 5]
Then for the next value:
20 * (2 / 10) = 4
20 * (3 / 10) = 6
20 * (5 / 10) = 10
Therefore [4, 6, 10]
etc., which results in the intermediate dataframe:
ID values
0 1 [2.0, 3.0, 5.0]
1 2 [4.0, 6.0, 10.0]
2 3 [6.0, 9.0, 15.0]
Using the explode
method on this dataframe produces your desired result.
CodePudding user response:
here is one way to do it
ratio = [2, 3, 5]
ratio_dec = np.divide(ratio, sum(ratio))
df['ratio'] = df['values'].apply(lambda x: np.round(np.multiply(x, ratio_dec) ,0))
df.explode('ratio')
ID values ratio
0 1 10 2.0
0 1 10 3.0
0 1 10 5.0
1 2 20 4.0
1 2 20 6.0
1 2 20 10.0
2 3 30 6.0
2 3 30 9.0
2 3 30 15.0
CodePudding user response:
Here's a way:
ratio = [2, 3, 5]
df = df.assign(**{f'ratio_{i}':df['values'] * x / sum(ratio)
for i, x in enumerate(ratio)}).set_index(['ID', 'values']).stack().to_frame(
'values').reset_index(level=0).reset_index(drop=True)
Output:
ID values
0 1 2.0
1 1 3.0
2 1 5.0
3 2 4.0
4 2 6.0
5 2 10.0
6 3 6.0
7 3 9.0
8 3 15.0