Home > Back-end >  R dplyr: conditionally swap the elements of two columns
R dplyr: conditionally swap the elements of two columns

Time:10-06

A simple question, but I cannot get it done in a clean way. Consider the dataframe df at the end of the post. I simply would like to swap the elements of columns x and y whenever x>y. There may be other columns in the dataframe which I do not want to touch. In a sense, I would like to sort row wise the columns x and y. Any idea about how to achieve this? Thanks!

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union



df<-tibble(x=1:10, y=10:1, extra=LETTERS[1:10])
   

df
#> # A tibble: 10 × 3
#> # Rowwise: 
#>        x     y extra
#>    <int> <int> <chr>
#>  1     1    10 A    
#>  2     2     9 B    
#>  3     3     8 C    
#>  4     4     7 D    
#>  5     5     6 E    
#>  6     6     5 F    
#>  7     7     4 G    
#>  8     8     3 H    
#>  9     9     2 I    
#> 10    10     1 J

Created on 2021-10-06 by the reprex package (v2.0.1)

CodePudding user response:

base solution:

use which(df$x > df$y) to determine row numbers you want to change, then use rev to swap values for these:

df[which(df$x > df$y), c("x", "y")] <- rev(df[which(df$x > df$y), c("x", "y")])
df
#        x     y extra
#    <int> <int> <chr>
#  1     1    10 A    
#  2     2     9 B    
#  3     3     8 C    
#  4     4     7 D    
#  5     5     6 E    
#  6     5     6 F    
#  7     4     7 G    
#  8     3     8 H    
#  9     2     9 I    
# 10     1    10 J  

CodePudding user response:

This looks like sorting for me:

library(tidyverse)
df <- tibble(x=1:10, y=10:1, extra=LETTERS[1:10])
df
#> # A tibble: 10 x 3
#>        x     y extra
#>    <int> <int> <chr>
#>  1     1    10 A    
#>  2     2     9 B    
#>  3     3     8 C    
#>  4     4     7 D    
#>  5     5     6 E    
#>  6     6     5 F    
#>  7     7     4 G    
#>  8     8     3 H    
#>  9     9     2 I    
#> 10    10     1 J

extra_cols <- df %>% colnames() %>% setdiff(c("x", "y"))
extra_cols
#> [1] "extra"

df %>%
  mutate(row = row_number()) %>%
  pivot_longer(-c(row, extra_cols)) %>%
  group_by_at(c("row", extra_cols)) %>%
  transmute(
    value = value %>% sort(),
    name = c("x", "y"),
  ) %>%
  pivot_wider() %>%
  ungroup() %>%
  select(-row)
#> Note: Using an external vector in selections is ambiguous.
#> ℹ Use `all_of(extra_cols)` instead of `extra_cols` to silence this message.
#> ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.
#> # A tibble: 10 x 3
#>    extra     x     y
#>    <chr> <int> <int>
#>  1 A         1    10
#>  2 B         2     9
#>  3 C         3     8
#>  4 D         4     7
#>  5 E         5     6
#>  6 F         5     6
#>  7 G         4     7
#>  8 H         3     8
#>  9 I         2     9
#> 10 J         1    10

Created on 2021-10-06 by the reprex package (v2.0.1)

CodePudding user response:

Try using apply on axis 1 and transpose it with t, then use as_tibble to convert it to a tibble.

Then finally change the column names:

> df <- as_tibble(t(apply(df, 1, sort)))
> names(df) <- c('x', 'y')
> df
# A tibble: 10 x 2
       x     y
   <int> <int>
 1     1    10
 2     2     9
 3     3     8
 4     4     7
 5     5     6
 6     5     6
 7     4     7
 8     3     8
 9     2     9
10     1    10

CodePudding user response:

Thanks everyone!

I wrote a small function which does what I need and generalizes to the case of multiple variables. See the reprex

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

set.seed(1234)

set_colnames <- `colnames<-`

df<-tibble(x=1:10, y=10:1, z=rnorm(10), extra=LETTERS[1:10]) %>%
    rowwise() 

df
#> # A tibble: 10 × 4
#> # Rowwise: 
#>        x     y      z extra
#>    <int> <int>  <dbl> <chr>
#>  1     1    10 -1.21  A    
#>  2     2     9  0.277 B    
#>  3     3     8  1.08  C    
#>  4     4     7 -2.35  D    
#>  5     5     6  0.429 E    
#>  6     6     5  0.506 F    
#>  7     7     4 -0.575 G    
#>  8     8     3 -0.547 H    
#>  9     9     2 -0.564 I    
#> 10    10     1 -0.890 J


sort_rows <- function(df, col_names, dec=F){

    temp <- df %>%
        select(all_of(col_names))

    extra_names <- setdiff(colnames(df), col_names)

    temp2 <- df %>%
        select(all_of(extra_names))
    

    res <- t(apply(temp, 1, sort, decreasing=dec)) %>%
        as_tibble %>%
        set_colnames(col_names) %>%
        bind_cols(temp2)

    return(res)
    


}



col_names <- c("x", "y", "z")

df_s <- df %>%
    sort_rows(col_names, dec=F)
#> Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if `.name_repair` is omitted as of tibble 2.0.0.
#> Using compatibility `.name_repair`.


df_s
#> # A tibble: 10 × 4
#>         x     y     z extra
#>     <dbl> <dbl> <dbl> <chr>
#>  1 -1.21      1    10 A    
#>  2  0.277     2     9 B    
#>  3  1.08      3     8 C    
#>  4 -2.35      4     7 D    
#>  5  0.429     5     6 E    
#>  6  0.506     5     6 F    
#>  7 -0.575     4     7 G    
#>  8 -0.547     3     8 H    
#>  9 -0.564     2     9 I    
#> 10 -0.890     1    10 J

Created on 2021-10-06 by the reprex package (v2.0.1)

  • Related