Home > Software engineering >  How can I relocate mutated columns next to the original colums?
How can I relocate mutated columns next to the original colums?

Time:04-01

I have made a function which mutates across columns and creates new named columns from each of them. The new colums are put to the right side of the dataframe whereas I would like to have them adjacent to each of the original columns. I am looking for a solution that will generalise to any dataframe this function might be used on, writing a select statement to reorder the columns is not automatic enough for my use case.

test_data <- data.frame(data_col_1 = c(1,2,3),
                        data_col_2 = c(1,2,3),
                        data_col_3 = c(1,2,3),
                        another_column = c("a","b","c"))


perc_funct <- function(df, columns, numerator){
  
p_f <- function(x, numerator){
  
  (x/numerator)*100
}
    j <- df %>%
     mutate( across({{columns}}, 
                    .fns = list(perc = ~p_f(.x, numerator)),
                    .names = "{col}_{fn}"))# need to figure out a way to get the columns ordered 
return(j)
}

test_data %>% perc_funct(columns = starts_with("data"), numerator = 1)

The output currently puts all the new colums to the right.

"data_col_1" "data_col_2" "data_col_3" "another_column" "data_col_1_perc" "data_col_2_perc" "data_col_3_perc"

The output I want puts each new colums to the right of each old column. "data_col_1" "data_col_1_perc" "data_col_2" "data_col_2_perc" "data_col_3" "data_col_3_perc" "another_column"

CodePudding user response:

I typically sort the columns with select(sort(names(.))) afterwards:

library(dplyr)

test_data %>% 
  perc_funct(columns = starts_with("data"), numerator = 1) %>% 
  select(sort(names(.)))

#>   data_col_1 data_col_1_perc data_col_2 data_col_2_perc data_col_3
#> 1          1             100          1             100          1
#> 2          2             200          2             200          2
#> 3          3             300          3             300          3
#>   data_col_3_perc
#> 1             100
#> 2             200
#> 3             300

Created on 2022-04-01 by the reprex package (v2.0.1)

What if I have other columns I want to keep in the same spot?

It's just a matter of nesting my solution above together with other select statements or dplyr verbs. You might have to save the dataframe with the unsorted columns as a intermediate step.

Example 1

Here is an example with three other columns, where I want some to come first, some to come last, and others to come anywhere but stay together.

library(dplyr)

df <- 
  test_data %>% 
  mutate(first_col = 1, other_columns = 100, last_col = 999) %>%
  perc_funct(columns = starts_with("data"), numerator = 1)

# Unsorted:
df %>% names()
#> [1] "data_col_1"      "data_col_2"      "data_col_3"      "first_col"      
#> [5] "other_columns"   "last_col"        "data_col_1_perc" "data_col_2_perc"
#> [9] "data_col_3_perc"

# Sorted:
df %>% 
  select(
    first_col,
    df %>% select(starts_with("data")) %>% names() %>% sort(), 
    everything(),
    last_col
  ) 
#>   first_col data_col_1 data_col_1_perc data_col_2 data_col_2_perc data_col_3
#> 1         1          1             100          1             100          1
#> 2         1          2             200          2             200          2
#> 3         1          3             300          3             300          3
#>   data_col_3_perc other_columns last_col
#> 1             100           100      999
#> 2             200           100      999
#> 3             300           100      999

Created on 2022-04-01 by the reprex package (v2.0.1)

Example 2

There's also an alternative using col_bind():

If you just want your new columns last, but sorted together with the columns they were created from, you can also do something like:

library(dplyr)
df %>% 
  select(
    -starts_with("data")
  ) %>% bind_cols(
    df %>% 
      select(
        df %>% select(starts_with("data")) %>% names() %>% sort()
      )
  )
#>   first_col other_columns last_col data_col_1 data_col_1_perc data_col_2
#> 1         1           100      999          1             100          1
#> 2         1           100      999          2             200          2
#> 3         1           100      999          3             300          3
#>   data_col_2_perc data_col_3 data_col_3_perc
#> 1             100          1             100
#> 2             200          2             200
#> 3             300          3             300

CodePudding user response:

The recommended way to move columns using dplyr (since version 1.0.0) is to use relocate(). relocate() supports tidyselect semantics but importantly acts only on the selected column(s) leaving all other columns in place. In your case, you can grep() and sort() on the columns beginning with data.

test_data <- data.frame(column_1 = 1:3,
                        data_col_1 = c(1,2,3),
                        data_col_2 = c(1,2,3),
                        data_col_3 = c(1,2,3),
                        another_column = c("a","b","c"))


test_data %>%
  perc_funct(columns = starts_with("data"), numerator = 1) %>%
  relocate(sort(grep("^data", names(.), value = TRUE)), .before = data_col_1)

  column_1 data_col_1 data_col_1_perc data_col_2 data_col_2_perc data_col_3 data_col_3_perc another_column
1        1          1             100          1             100          1             100              a
2        2          2             200          2             200          2             200              b
3        3          3             300          3             300          3             300              c

The .before (or .after) argument specifies where to relocate the columns, in this case you can place them before data_col_1.

  • Related