Home > database >  how to select rows from lists in r?
how to select rows from lists in r?

Time:12-26

df1 = data.frame(
y1 = c(1, -2, 3),
y2 = c(4, 5, 6),out = c("*", "*", "")
 )

Create another dataframe

  df2 = data.frame(
   y1 = c(7, -3, 9),
   y2 = c(1, 4, 6),out = c("", "*", "")
   )

Create list of data frame using list()

   lis = list(df1, df2)

I want to compare row by row (first row from index [[1]] with first row from index [[2]] and so on

if both rows have * or empty (similar) under out, then select the row with the highest absolute value in y1 (and put under ind the index)

otherwise, take the y1 value of the row with * under out (and put under ind the index)

example of desired output:

 y1 y2 out ind
 1  4   *    1
-3  4   *    2
 9  6        2

CodePudding user response:

An option is to bind the dataset with bind_rows and specify .id to create an identifier for the list sequence. Then grouped by the rowid of the 'ind' column, either we arrange the rows in the group so that * values in out will be first, then do the order on absolute values of 'y1' in descending. We slice the row with the max absolute value in 'y1' if the number of distinct elements in 'out' is 1 or else return the 1st row (will be * in out if there is one)

library(dplyr)
library(data.table)
bind_rows(lis, .id = 'ind') %>% 
  group_by(rn = rowid(ind)) %>%
  arrange(out != "*", desc(abs(y1)), .by_group = TRUE) %>% 
  slice(if(n_distinct(out) == 1) which.max(abs(y1)) else 1) %>% 
  ungroup %>% 
  select(y1, y2, out, ind)

-output

# A tibble: 3 × 4
     y1    y2 out   ind  
  <dbl> <dbl> <chr> <chr>
1     1     4 "*"   1    
2    -3     4 "*"   2    
3     9     6 ""    2    

Or without arrange, after grouping, create a position index in slice ('i1') where the values in out are *, then use that index to subset the y1 values to find the position where it is absolute max in else case and return the position of 'i1' by indexing

bind_rows(lis, .id = 'ind') %>% 
  group_by(rn = rowid(ind)) %>%
  slice({i1 <- which(out == "*")
   if(n_distinct(out) == 1) which.max(abs(y1)) else 
      i1[which.max(abs(y1[i1]))]}) %>%
   ungroup %>% 
   select(y1, y2, out, ind)

-output

# A tibble: 3 × 4
     y1    y2 out   ind  
  <dbl> <dbl> <chr> <chr>
1     1     4 "*"   1    
2    -3     4 "*"   2    
3     9     6 ""    2    

Or use a similar ordering approach with data.table

library(data.table)
 dt1 <- rbindlist(lis, idcol = "ind")[, rn := rowid(ind)]
dt2 <- dt1[order(rn, out != "*", desc(abs(y1)))]
dt2[dt2[, .I[if(uniqueN(out) == 1) which.max(abs(y1)) else 
      1L], .(rn)]$V1][, rn := NULL][]
   ind y1 y2 out
1:   1  1  4   *
2:   2 -3  4   *
3:   2  9  6  
  • Related