As part of a much larger project, I am trying to create a new column in a data.frame called "unique_id" based on the interaction of user-specified variables. In this use-case, the number of variables needed and the names will vary quite a bit which each user, so this flexibility is important. Some data.frames, such as the toy one I made in my example will even come with a "unique id" variable, but this is quite rare. I included in my example to be clear about what my desired output is.
Consider this toy data.frame:
mini_df <- data.frame(
lat = c(41.23,37.37,41.23,39.01,32.00),
lon = c(-120.79,-120.68,-120.79,-119.13,-120.00),
station_id = c(300,527,300,228,72)
)
Outside of a proper function, it is quite easy to do something like this:
out_of_function_test_df <- mini_df %>%
mutate(id = interaction(lat, lon))
Which produces what I want, namely:
lat lon station_id id
1 41.23 -120.79 300 41.23.-120.79
2 37.37 -120.68 527 37.37.-120.68
3 41.23 -120.79 300 41.23.-120.79
4 39.01 -119.13 228 39.01.-119.13
5 32.00 -120.00 72 32.-120
I need this to work within a function in which the user specifies the interacting variables.
I have read many stack exchange posts which approach similar problems with some important differences to mine. The other questions address verbs other than mutate, attempt to apply different functions, or do not address the issue of multiple user-specified variables.
After reading these, trying many things, and reading this, the best I can come up with is the following:
create_unique_id <- function(df,
metadata_coords,
unique_id_coords) {
df <- df %>%
mutate_(id = interp(~interaction(args), # causes error with or without tilde
args = c("list", lapply(unique_id_coords, as.name))))
return(df)
}
This produces an error:
Error in unique.default(x, nmax = nmax) :
unique() applies only to vectors
Here is the full traceback, if it is helpful:
22.
unique.default(x, nmax = nmax)
21.
unique(x, nmax = nmax)
20.
factor(x)
19.
as.factor(args[[i]])
18.
interaction(list("list", lat, lon))
17.
mutate_impl(.data, dots, caller_env())
16.
mutate.tbl_df(tbl_df(.data), ...)
15.
mutate(tbl_df(.data), ...)
14.
as.data.frame(mutate(tbl_df(.data), ...))
13.
mutate.data.frame(.data, !!!dots)
12.
mutate(.data, !!!dots)
11.
mutate_.data.frame(., id = interp(~interaction(args), args = c("list",
lapply(unique_id_coords, as.name))))
10.
mutate_(., id = interp(~interaction(args), args = c("list", lapply(unique_id_coords,
as.name))))
9.
function_list[[k]](value)
8.
withVisible(function_list[[k]](value))
7.
freduce(value, `_function_list`)
6.
`_fseq`(`_lhs`)
5.
eval(quote(`_fseq`(`_lhs`)), env, env)
4.
eval(quote(`_fseq`(`_lhs`)), env, env)
3.
withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
2.
df %>% mutate_(id = interp(~interaction(args), args = c("list",
lapply(unique_id_coords, as.name))))
1.
create_unique_id(df = mini_df, metadata_coords = c("lat", "lon",
"station_id"), unique_id_coords = c("lat", "lon"))
I do not have nearly enough background knowledge for this to be helpful to me. I am confused because it seems that the issue is deep within the interact() function. interact() calls unique() along the way (which makes sense), but unique() is what ends up failing. Somehow, the initial call of interact() within my function is different than when it was outside the function, but I am not sure how.
CodePudding user response:
Does this application of tidyr::unite()
help?
create_unique_id <- function(df,
unique_id_coords) {
df <- df %>%
tidyr::unite("id", {{ unique_id_coords }}, remove = FALSE)
return(df)
}
Result
create_unique_id(mtcars, unique_id_coords = c(wt, qsec)) %>% head()
mpg cyl disp hp drat id wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.62_16.46 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875_17.02 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.32_18.61 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215_19.44 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.44_17.02 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.46_20.22 3.460 20.22 1 0 3 1
CodePudding user response:
Here is another option using the ellipses ...
:
library(rlang)
create_unique_id <- function(df, ...){
df %>%
mutate(id = paste(!!! ensyms(...), sep = "_"))
}
Output
create_unique_id(mtcars, cyl, hp, vs) %>% head()
mpg cyl disp hp drat wt qsec vs am gear carb id
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 6_110_0
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 6_110_0
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 4_93_1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 6_110_1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 8_175_0
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 6_105_1