Home > Software design >  Applying interaction() to user-specified column from within a tidyverse pipe
Applying interaction() to user-specified column from within a tidyverse pipe

Time:10-16

As part of a much larger project, I am trying to create a new column in a data.frame called "unique_id" based on the interaction of user-specified variables. In this use-case, the number of variables needed and the names will vary quite a bit which each user, so this flexibility is important. Some data.frames, such as the toy one I made in my example will even come with a "unique id" variable, but this is quite rare. I included in my example to be clear about what my desired output is.

Consider this toy data.frame:

mini_df <- data.frame(
  lat = c(41.23,37.37,41.23,39.01,32.00),
  lon = c(-120.79,-120.68,-120.79,-119.13,-120.00),
  station_id = c(300,527,300,228,72)
)

Outside of a proper function, it is quite easy to do something like this:

out_of_function_test_df <- mini_df %>%
  mutate(id = interaction(lat, lon))

Which produces what I want, namely:

    lat     lon station_id            id
1 41.23 -120.79        300 41.23.-120.79
2 37.37 -120.68        527 37.37.-120.68
3 41.23 -120.79        300 41.23.-120.79
4 39.01 -119.13        228 39.01.-119.13
5 32.00 -120.00         72       32.-120

I need this to work within a function in which the user specifies the interacting variables.

I have read many stack exchange posts which approach similar problems with some important differences to mine. The other questions address verbs other than mutate, attempt to apply different functions, or do not address the issue of multiple user-specified variables.

After reading these, trying many things, and reading this, the best I can come up with is the following:

create_unique_id <- function(df,
                             metadata_coords,
                             unique_id_coords) {
  df <- df %>%
    mutate_(id = interp(~interaction(args), # causes error with or without tilde
                        args = c("list", lapply(unique_id_coords, as.name))))

  return(df)
}

This produces an error:

Error in unique.default(x, nmax = nmax) : 
  unique() applies only to vectors

Here is the full traceback, if it is helpful:

22.
unique.default(x, nmax = nmax) 
21.
unique(x, nmax = nmax) 
20.
factor(x) 
19.
as.factor(args[[i]]) 
18.
interaction(list("list", lat, lon)) 
17.
mutate_impl(.data, dots, caller_env()) 
16.
mutate.tbl_df(tbl_df(.data), ...) 
15.
mutate(tbl_df(.data), ...) 
14.
as.data.frame(mutate(tbl_df(.data), ...)) 
13.
mutate.data.frame(.data, !!!dots) 
12.
mutate(.data, !!!dots) 
11.
mutate_.data.frame(., id = interp(~interaction(args), args = c("list", 
    lapply(unique_id_coords, as.name)))) 
10.
mutate_(., id = interp(~interaction(args), args = c("list", lapply(unique_id_coords, 
    as.name)))) 
9.
function_list[[k]](value) 
8.
withVisible(function_list[[k]](value)) 
7.
freduce(value, `_function_list`) 
6.
`_fseq`(`_lhs`) 
5.
eval(quote(`_fseq`(`_lhs`)), env, env) 
4.
eval(quote(`_fseq`(`_lhs`)), env, env) 
3.
withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 
2.
df %>% mutate_(id = interp(~interaction(args), args = c("list", 
    lapply(unique_id_coords, as.name)))) 
1.
create_unique_id(df = mini_df, metadata_coords = c("lat", "lon", 
    "station_id"), unique_id_coords = c("lat", "lon"))

I do not have nearly enough background knowledge for this to be helpful to me. I am confused because it seems that the issue is deep within the interact() function. interact() calls unique() along the way (which makes sense), but unique() is what ends up failing. Somehow, the initial call of interact() within my function is different than when it was outside the function, but I am not sure how.

CodePudding user response:

Does this application of tidyr::unite() help?

create_unique_id <- function(df, 
                             unique_id_coords) {
  df <- df %>%
    tidyr::unite("id", {{ unique_id_coords }}, remove = FALSE)
  return(df)
}

Result

create_unique_id(mtcars, unique_id_coords = c(wt, qsec)) %>% head()

                   mpg cyl disp  hp drat          id    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90  2.62_16.46 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875_17.02 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85  2.32_18.61 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215_19.44 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15  3.44_17.02 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76  3.46_20.22 3.460 20.22  1  0    3    1

CodePudding user response:

Here is another option using the ellipses ...:

library(rlang)
create_unique_id <- function(df, ...){
  
  df %>% 
    mutate(id = paste(!!! ensyms(...), sep = "_"))
  
}

Output

create_unique_id(mtcars, cyl, hp, vs) %>% head()

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb      id
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 6_110_0
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 6_110_0
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1  4_93_1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 6_110_1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 8_175_0
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 6_105_1
  • Related