df1 = data.frame(
y1 = c(1, -2, 3),
y2 = c(4, 5, 6),out = c("*", "*", "")
)
Create another dataframe
df2 = data.frame(
y1 = c(7, -3, 9),
y2 = c(1, 4, 6),out = c("", "*", "")
)
Create list of data frame using list()
lis = list(df1, df2)
I want to compare row by row (first row from index [[1]] with first row from index [[2]] and so on
if both rows have * or empty (similar) under out, then select the row with the highest absolute value in y1 (and put under ind the index)
otherwise, take the y1 value of the row with * under out (and put under ind the index)
example of desired output:
y1 y2 out ind
1 4 * 1
-3 4 * 2
9 6 2
CodePudding user response:
An option is to bind the dataset with bind_rows
and specify .id
to create an identifier for the list
sequence. Then grouped by the rowid
of the 'ind' column, either we arrange
the rows in the group so that *
values in out
will be first, then do the order on absolute values of 'y1' in desc
ending. We slice
the row with the max
absolute value in 'y1' if the number of distinct elements in 'out' is 1 or else return the 1st row (will be *
in out if there is one)
library(dplyr)
library(data.table)
bind_rows(lis, .id = 'ind') %>%
group_by(rn = rowid(ind)) %>%
arrange(out != "*", desc(abs(y1)), .by_group = TRUE) %>%
slice(if(n_distinct(out) == 1) which.max(abs(y1)) else 1) %>%
ungroup %>%
select(y1, y2, out, ind)
-output
# A tibble: 3 × 4
y1 y2 out ind
<dbl> <dbl> <chr> <chr>
1 1 4 "*" 1
2 -3 4 "*" 2
3 9 6 "" 2
Or without arrange
, after grouping, create a position index in slice
('i1') where the values in out are *
, then use that index to subset the y1 values to find the position where it is absolute max in else
case and return the position of 'i1' by indexing
bind_rows(lis, .id = 'ind') %>%
group_by(rn = rowid(ind)) %>%
slice({i1 <- which(out == "*")
if(n_distinct(out) == 1) which.max(abs(y1)) else
i1[which.max(abs(y1[i1]))]}) %>%
ungroup %>%
select(y1, y2, out, ind)
-output
# A tibble: 3 × 4
y1 y2 out ind
<dbl> <dbl> <chr> <chr>
1 1 4 "*" 1
2 -3 4 "*" 2
3 9 6 "" 2
Or use a similar ordering approach with data.table
library(data.table)
dt1 <- rbindlist(lis, idcol = "ind")[, rn := rowid(ind)]
dt2 <- dt1[order(rn, out != "*", desc(abs(y1)))]
dt2[dt2[, .I[if(uniqueN(out) == 1) which.max(abs(y1)) else
1L], .(rn)]$V1][, rn := NULL][]
ind y1 y2 out
1: 1 1 4 *
2: 2 -3 4 *
3: 2 9 6