I have some integer-based survey data. N = 10,000, X = 6. I would like to encode each of the 10,000 observations as a string of 6 variables
Could someone please show me how to encode this in R, such that rows with duplicate strings take on the same assignment?
set.seed(1)
dat <- data.frame(
matrix(
sample(1:3,5*12, replace=TRUE),12,5,
dimnames=list(1:12,c("X1","X2","X3","X4","X5"))
),
Sex=rep(c("Male", "Female")))
CodePudding user response:
We can paste
columns with a specific sep
, e.g.,
> do.call(paste, c(dat, sep = ", "))
[1] "1, 1, 1, 2, 2, Male" "3, 1, 2, 2, 3, Female" "1, 2, 1, 2, 3, Male"
[4] "2, 2, 1, 3, 2, Female" "1, 2, 2, 2, 2, Male" "3, 2, 2, 1, 2, Female"
[7] "3, 3, 2, 3, 2, Male" "2, 1, 1, 2, 1, Female" "2, 3, 3, 1, 2, Male"
[10] "3, 1, 1, 1, 2, Female" "3, 1, 3, 3, 2, Male" "1, 1, 2, 2, 2, Female"
CodePudding user response:
Paste all columns, convert to factor, then convert to integer:
dat$id <- as.integer(factor(apply(dat, 1, paste, collapse = "_")))
CodePudding user response:
Using tidyr::unite()
:
library(tidyr)
unite(dat, "X", everything(), sep = " ")
X
1 1 1 1 2 2 Male
2 3 1 2 2 3 Female
3 1 2 1 2 3 Male
4 2 2 1 3 2 Female
5 1 2 2 2 2 Male
6 3 2 2 1 2 Female
7 3 3 2 3 2 Male
8 2 1 1 2 1 Female
9 2 3 3 1 2 Male
10 3 1 1 1 2 Female
11 3 1 3 3 2 Male
12 1 1 2 2 2 Female
CodePudding user response:
An option with exec
and str_c
library(purrr)
library(stringr)
library(dplyr)
dat %>%
transmute(X= exec(str_c, !!! rlang::syms(names(.)), sep = " "))
-output
X
1 1 1 1 2 2 Male
2 3 1 2 2 3 Female
3 1 2 1 2 3 Male
4 2 2 1 3 2 Female
5 1 2 2 2 2 Male
6 3 2 2 1 2 Female
7 3 3 2 3 2 Male
8 2 1 1 2 1 Female
9 2 3 3 1 2 Male
10 3 1 1 1 2 Female
11 3 1 3 3 2 Male
12 1 1 2 2 2 Female