Home > Mobile >  Create new column based on relative size of values in other columns, data.table
Create new column based on relative size of values in other columns, data.table

Time:10-11

I have a table with two columns of values between 0 and 1. For example:

set.seed(123)
table <- data.table(value1 = runif(10),
                    value2 = runif(10))
table

   value1     value2
 0.2875775 0.95683335
 0.7883051 0.45333416
 0.4089769 0.67757064
 0.8830174 0.57263340
 0.9404673 0.10292468
 0.0455565 0.89982497
 0.5281055 0.24608773
 0.8924190 0.04205953
 0.5514350 0.32792072
 0.4566147 0.95450365

I would like to use data.table to make a new binary column, assigning 1 to the x rows with the largest difference between value2 and value1. I can make a "difference" column like this:

table[,difference:=value1-value2]

And I can find the x largest differences using order and tail e.g. if x is 5:

x<-5
table[order(difference), tail(.SD, x)]

But I haven't been able to figure out a way to combine these with something like ifelse or case_when to assign 1 to the x largest differences and 0 to the rest.

CodePudding user response:

setorderv(table, "difference", order = -1)

table[, large := 0]
x <- 5
table[1:x, large := 1]

CodePudding user response:

I hope this solves your issue:

library(data.table)
library(dtplyr)
library(dplyr)
#> 
#> Attache Paket: 'dplyr'
#> The following objects are masked from 'package:data.table':
#> 
#>     between, first, last
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

set.seed(123)
table <- data.table(value1 = runif(10),
                    value2 = runif(10))
table
#>        value1     value2
#>  1: 0.2875775 0.95683335
#>  2: 0.7883051 0.45333416
#>  3: 0.4089769 0.67757064
#>  4: 0.8830174 0.57263340
#>  5: 0.9404673 0.10292468
#>  6: 0.0455565 0.89982497
#>  7: 0.5281055 0.24608773
#>  8: 0.8924190 0.04205953
#>  9: 0.5514350 0.32792072
#> 10: 0.4566147 0.95450365

x <- 5

table <- table %>% 
  lazy_dt() %>% 
  mutate(difference = value1 - value2) %>% 
  arrange(desc(difference)) %>% 
  mutate(difference = ifelse(test = row_number() <= x, yes = 1, no = 0)) %>% 
  as.data.table()

table
#>        value1     value2 difference
#>  1: 0.8924190 0.04205953          1
#>  2: 0.9404673 0.10292468          1
#>  3: 0.7883051 0.45333416          1
#>  4: 0.8830174 0.57263340          1
#>  5: 0.5281055 0.24608773          1
#>  6: 0.5514350 0.32792072          0
#>  7: 0.4089769 0.67757064          0
#>  8: 0.4566147 0.95450365          0
#>  9: 0.2875775 0.95683335          0
#> 10: 0.0455565 0.89982497          0

Greets, M.

Created on 2021-10-11 by the reprex package (v2.0.1)

  • Related