Home > OS >  Calculate difference with closest conditioned value in Julia
Calculate difference with closest conditioned value in Julia

Time:01-04

I have the following looking dataframe:

using DataFrames

df = DataFrame(
    condition = [false, false, true, false, false, false, true, false, false, false],
    time = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

Output:

10×2 DataFrame
 Row │ condition  time  
     │ Bool       Int64 
─────┼──────────────────
   1 │     false      1
   2 │     false      2
   3 │      true      3
   4 │     false      4
   5 │     false      5
   6 │     false      6
   7 │      true      7
   8 │     false      8
   9 │     false      9
  10 │     false     10

I would like to calculate the difference in rows with respect to a conditioned value (true/false). This means that for row 1 the nearest true is 2 rows a way. The conditioned rows with true should have a value of 0. Here is the desired output:

10×3 DataFrame
 Row │ condition  time   diff  
     │ Bool       Int64  Int64 
─────┼─────────────────────────
   1 │     false      1      2
   2 │     false      2      1
   3 │      true      3      0
   4 │     false      4      1
   5 │     false      5      2
   6 │     false      6      1
   7 │      true      7      0
   8 │     false      8      1
   9 │     false      9      2
  10 │     false     10      3

So I was wondering if anyone knows how to calculate the difference in rows with the closest conditioned value in dataframe Julia?

CodePudding user response:

transform(df, :condition =>
  (w->((f,u)->min.(f(u),reverse(f(reverse(u)))))(
  v->accumulate(
    (x,y)->ifelse(y,0,x 1),
    v;init=length(v)
  ), 
  w
  )) => :diff)

(u,v,w are vectors. x,y are bool/int. f is a function)

Does the job with output:

10×3 DataFrame
 Row │ condition  time   diff  
     │ Bool       Int64  Int64 
─────┼─────────────────────────
   1 │     false      1      2
   2 │     false      2      1
   3 │      true      3      0
   4 │     false      4      1
   5 │     false      5      2
   6 │     false      6      1
   7 │      true      7      0
   8 │     false      8      1
   9 │     false      9      2
  10 │     false     10      3

On my REPL it was 1-line as follows, but tried to make it more readable above:

transform(df, :condition => (v->((f, v)->min.(f(v),reverse(f(reverse(v)))))(v->accumulate((x, y)->ifelse(y, 0, x 1), v; init=length(v)), v)) => :diff)

It is not the clearest way, and also not the most efficient, but it is a short piece of code. To get clearer and more efficient result, a separate function should be defined.

Last thing, the column has to have one true value, otherwise the results are not meaningful (this can be checked easily with a bit more code, but not sure what OP wants in this case).

  • Related