Home > Net >  Calculate difference between consecutive rows per group in dataframe Julia
Calculate difference between consecutive rows per group in dataframe Julia

Time:12-27

I have the following dataframe called df:

using DataFrames

df = DataFrame(group = ["A", "A", "A", "A", "B", "B", "B", "B"],
               value = [2,1,4,3,3,5,2,1])

8×2 DataFrame
 Row │ group   value 
     │ String  Int64 
─────┼───────────────
   1 │ A           2
   2 │ A           1
   3 │ A           4
   4 │ A           3
   5 │ B           3
   6 │ B           5
   7 │ B           2
   8 │ B           1

I would like to calculate the difference with previous values of consecutive rows in column value per group. The offset should have NaN, 0, or missing. Here is the desired output:

8×3 DataFrame
 Row │ group   value  diff    
     │ String  Int64  Float64 
─────┼────────────────────────
   1 │ A           2    NaN
   2 │ A           1     -1.0
   3 │ A           4      3.0
   4 │ A           3     -1.0
   5 │ B           3    NaN
   6 │ B           5      2.0
   7 │ B           2     -3.0
   8 │ B           1     -2.0

So I was wondering if anyone knows how to calculate the difference between consecutive rows per group in Julia?

CodePudding user response:

Using DataFrames.jl (you can replace missing by any value you like):

julia> select(groupby(df, :group),
              :value => (x -> [missing; diff(x)]) => :diff)
8×2 DataFrame
 Row │ group   diff
     │ String  Int64?
─────┼─────────────────
   1 │ A       missing
   2 │ A            -1
   3 │ A             3
   4 │ A            -1
   5 │ B       missing
   6 │ B             2
   7 │ B            -3
   8 │ B            -1

Using DataFramesMeta.jl:

julia> @chain df begin
           groupby(:group)
           @select :diff = [missing; diff(:value)]
       end
8×2 DataFrame
 Row │ group   diff
     │ String  Int64?
─────┼─────────────────
   1 │ A       missing
   2 │ A            -1
   3 │ A             3
   4 │ A            -1
   5 │ B       missing
   6 │ B             2
   7 │ B            -3
   8 │ B            -1

Normally diff in Julia like in e.g. R would produce one less row (and the syntax would be simpler:

julia> combine(groupby(df, :group), :value => diff => :diff)
6×2 DataFrame
 Row │ group   diff
     │ String  Int64
─────┼───────────────
   1 │ A          -1
   2 │ A           3
   3 │ A          -1
   4 │ B           2
   5 │ B          -3
   6 │ B          -1

julia> @chain df begin
           groupby(:group)
           @combine :diff = diff(:value)
       end
6×2 DataFrame
 Row │ group   diff
     │ String  Int64
─────┼───────────────
   1 │ A          -1
   2 │ A           3
   3 │ A          -1
   4 │ B           2
   5 │ B          -3
   6 │ B          -1

Yet another way would be to use lag from ShiftedArrays.jl:

julia> using ShiftedArrays: lag

julia> @chain df begin
           groupby(:group)
           @combine :diff = :value - lag(:value)
       end
8×2 DataFrame
 Row │ group   diff
     │ String  Int64?
─────┼─────────────────
   1 │ A       missing
   2 │ A            -1
   3 │ A             3
   4 │ A            -1
   5 │ B       missing
   6 │ B             2
   7 │ B            -3
   8 │ B            -1
  • Related