Home > Enterprise >  Python Pandas calculate total volume with last article volume
Python Pandas calculate total volume with last article volume

Time:11-16

I have the following problem and do not know how to solve it in a perfomant way:

Input Pandas DataFrame:

timestep article volume
35 1 20
37 2 5
123 2 12
155 3 10
178 2 23
234 1 17
478 1 28

Output Pandas DataFrame:

timestep volume
35 20
37 25
123 32
178 53
234 50
478 61

Calculation Example for timestep 478: 28 (last article 1 volume) 23 (last article 2 volume) 10 (last article 3 volume) = 61

What ist the best way to do this in pandas?

CodePudding user response:

Try with ffill:

#sort if needed
df = df.sort_values("timestep")

df["volume"] = (df["volume"].where(df["article"].eq(1)).ffill().fillna(0)   
                df["volume"].where(df["article"].eq(2)).ffill().fillna(0))
output = df.drop("article", axis=1)

>>> output
  timestep  volume
0        35    20.0
1        37    25.0
2       123    32.0
3       178    43.0
4       234    40.0
5       478    51.0

CodePudding user response:

Group By article & Take last element & Sum

 df.groupby(['article']).tail(1)["volume"].sum()

CodePudding user response:

You can set group number of consecutive article by .cumsum(). Then get the value of previous group last item by .map() with GroupBy.last(). Finally, add volume with this previous last, as follows:

# Get group number of consecutive `article`
g = df['article'].ne(df['article'].shift()).cumsum() 

# Add `volume` to previous group last
df['volume']  = g.sub(1).map(df.groupby(g)['volume'].last()).fillna(0, downcast='infer')

Result:

print(df)

   timestep  article  volume
0        35        1      20
1        37        2      25
2       123        2      32
3       178        2      43
4       234        1      40
5       478        1      51

Breakdown of steps

Previous group last values:

g.sub(1).map(df.groupby(g)['volume'].last()).fillna(0, downcast='infer')


0     0
1    20
2    20
3    20
4    43
5    43
Name: article, dtype: int64

CodePudding user response:

Try:

df["new_volume"] = (
    df.loc[df["article"] != df["article"].shift(-1), "volume"]
        .reindex(df.index, method='ffill')
        .shift()
        df["volume"]
).fillna(df["volume"])

df

Output:

   timestep  article  volume  new_volume
0        35        1      20        20.0
1        37        2       5        25.0
2       123        2      12        32.0
3       178        2      23        43.0
4       234        1      17        40.0
5       478        1      28        51.0

Explained:

Find the last record of each group by checking the 'article' from the previous row, then reindex that series aligning to the original dataframe and fill forward and shift to the next group with that 'volume'. And this to the current row's 'volume' and fill that first value with the original 'volume' value.

  • Related