Home > Software design >  A dplyr-compatible (%>%) user function / verb
A dplyr-compatible (%>%) user function / verb

Time:10-16

I am trying to write a custom function to apply on some dataframes using dplyr pipes. This function should do several manipulations on the selected columns, such as replace comma with dot, extract numbers and convert them to the numeric values. Here is a reduced example:

library(dplyr)
library(stringr)

parties <- c("SPD", "B90_Gruene")

df <- data.frame(     
  SPD = c("26 %", "25 %", "25 %", "26 %", "26 %"),
  B90_Gruene =c ("17 %", "16 %", "17 %", "15 %", "15 %"))

rem_per_cent <- function(.data, columns) {
  nd <- .data      
  for (v in columns){
    nd <- nd %>% mutate("{{v}}" := unlist(str_split(.data[[v]], "%"))[1])
  }
  return(nd)
}

df %>% rem_per_cent(parties)

The output is wrong. The first value replaces all the column:

   SPD B90_Gruene "SPD" "B90_Gruene"
1 26 %       17 %   26           17 
2 25 %       16 %   26           17 
3 25 %       17 %   26           17 
4 26 %       15 %   26           17 
5 26 %       15 %   26           17 

Replacing unlist()[1] with head(,1) gives quite the same.

When unlist()[1] is removed, turns out that the output of str_split has been correct:

   SPD B90_Gruene "SPD" "B90_Gruene"
1 26 %       17 % 26 ,         17 , 
2 25 %       16 % 25 ,         16 , 
3 25 %       17 % 25 ,         17 , 
4 26 %       15 % 26 ,         15 , 
5 26 %       15 % 26 ,         15 , 

I would like to understand why this function did not work. And secondly, when I used "{{v}}", the idea was that the original variable will be replaced instead of creating those strange columns. Many thanks!

CodePudding user response:

str_split returns a list of character vectors. So unlist(str_split(.data[[v]], "%"))[1] is the first element of the first list, which is then assigned to all rows of the data frame.

The correct syntax for your mutate is {{v}}, without the double quotes ("). With the double quotes you're effectively hard coding the variable name to include quotes, so you create new columns, rather than overwriting the existing ones.

You can get the effect you want by adding a rowwise() to your pipe:

rem_per_cent <- function(.data, columns) {
  nd <- .data      
  for (v in columns){
    nd <- nd %>% rowwise() %>% mutate({{v}} := unlist(str_split(.data[[v]], "%"))[1])
  }
  return(nd)
}

df %>% rem_per_cent(parties)
# A tibble: 5 × 2
# Rowwise: 
  SPD   B90_Gruene
  <chr> <chr>     
1 "26 " "17 "     
2 "25 " "16 "     
3 "25 " "17 "     
4 "26 " "15 "     
5 "26 " "15 "     

To remove the rowwise annotation, add an ungroup() to the pipe.

Our you can avoid the need for a custom function with

df %>% rowwise() %>% mutate(across(everything(), ~unlist(str_split(.x, "%"))[1]))

Which gives the same result.

  • Related