Home > other >  Select specific/all columns in rowwise
Select specific/all columns in rowwise

Time:12-03

I have the following table:

col1 col2 col3 col4
1 2 1 4
5 6 6 3

My goal is to find the max value per each row, and then find how many times it was repeated in the same row.

The resulting table should look like this:

col1 col2 col3 col4 max_val repetition
1 2 1 4 4 1
5 6 6 3 6 2

Now to achieve this, I am doing the following for Max:

df%>% rowwise%>%
mutate(max=max(col1:col4))

However, I am struggling to find the repetition. My idea is to use this pseudo code in mutate: sum( "select current row entirely or only for some columns"==max). But I don't know how to select entire row or only some columns of it and use its content to do the check, i.e.: is it equal to the max. How can we do this in dplyr?

CodePudding user response:

A dplyr approach:

library(dplyr)
df %>% 
  rowwise() %>% 
  mutate(max_val = max(across(everything())),
         repetition = sum(across(col1:col4) == max_val))

# A tibble: 2 × 6
# Rowwise: 
   col1  col2  col3  col4 max_val repetition
  <int> <int> <int> <int>   <int>      <int>
1     1     2     1     4       4          1
2     5     6     6     3       6          2

An R base approach:

df$max_val <- apply(df,1,max)
df$repetition <- rowSums(df[, 1:4] == df[, 5])

CodePudding user response:

For other (non-tidyverse) readers, a base R approach could be:

df$max_val  <- apply(df, 1, max)
df$repetition <- apply(df, 1, function(x) sum(x[1:4] == x[5]))

Output:

#   col1 col2 col3 col4 max_val repetition
# 1    1    2    1    4       4          1
# 2    5    6    6    3       6          2

CodePudding user response:

Although dplyr has added many tools for working across rows of data, it remains, in my mind at least, much easier to adhere to tidy principles and always convert the data to "long" format for these kinds of operations.

Thus, here is a tidy approach:

df %>%
    mutate(row = row_number()) %>%
    pivot_longer(cols = -row) %>%
    group_by(row) %>%
    mutate(max_val = max(value), repetitions = sum(value == max(value))) %>%
    pivot_wider(id_cols = c(row, max_val, repetitions)) %>%
    select(col1:col4, max_val, repetitions)

The last select() is just to get the columns in the order you want.

  • Related