I have the following problem and do not know how to solve it in a perfomant way:
Input Pandas DataFrame:
timestep | article | volume |
---|---|---|
35 | 1 | 20 |
37 | 2 | 5 |
123 | 2 | 12 |
155 | 3 | 10 |
178 | 2 | 23 |
234 | 1 | 17 |
478 | 1 | 28 |
Output Pandas DataFrame:
timestep | volume |
---|---|
35 | 20 |
37 | 25 |
123 | 32 |
178 | 53 |
234 | 50 |
478 | 61 |
Calculation Example for timestep 478: 28 (last article 1 volume) 23 (last article 2 volume) 10 (last article 3 volume) = 61
What ist the best way to do this in pandas?
CodePudding user response:
Try with ffill
:
#sort if needed
df = df.sort_values("timestep")
df["volume"] = (df["volume"].where(df["article"].eq(1)).ffill().fillna(0)
df["volume"].where(df["article"].eq(2)).ffill().fillna(0))
output = df.drop("article", axis=1)
>>> output
timestep volume
0 35 20.0
1 37 25.0
2 123 32.0
3 178 43.0
4 234 40.0
5 478 51.0
CodePudding user response:
Group By article & Take last element & Sum
df.groupby(['article']).tail(1)["volume"].sum()
CodePudding user response:
You can set group number of consecutive article
by .cumsum()
. Then get the value of previous group last item by .map()
with GroupBy.last()
. Finally, add volume
with this previous last, as follows:
# Get group number of consecutive `article`
g = df['article'].ne(df['article'].shift()).cumsum()
# Add `volume` to previous group last
df['volume'] = g.sub(1).map(df.groupby(g)['volume'].last()).fillna(0, downcast='infer')
Result:
print(df)
timestep article volume
0 35 1 20
1 37 2 25
2 123 2 32
3 178 2 43
4 234 1 40
5 478 1 51
Breakdown of steps
Previous group last values:
g.sub(1).map(df.groupby(g)['volume'].last()).fillna(0, downcast='infer')
0 0
1 20
2 20
3 20
4 43
5 43
Name: article, dtype: int64
CodePudding user response:
Try:
df["new_volume"] = (
df.loc[df["article"] != df["article"].shift(-1), "volume"]
.reindex(df.index, method='ffill')
.shift()
df["volume"]
).fillna(df["volume"])
df
Output:
timestep article volume new_volume
0 35 1 20 20.0
1 37 2 5 25.0
2 123 2 12 32.0
3 178 2 23 43.0
4 234 1 17 40.0
5 478 1 28 51.0
Explained:
Find the last record of each group by checking the 'article' from the previous row, then reindex that series aligning to the original dataframe and fill forward and shift to the next group with that 'volume'. And this to the current row's 'volume' and fill that first value with the original 'volume' value.