Home > Blockchain >  Convert Rows to Strings
Convert Rows to Strings

Time:11-15

I have some integer-based survey data. N = 10,000, X = 6. I would like to encode each of the 10,000 observations as a string of 6 variables

Could someone please show me how to encode this in R, such that rows with duplicate strings take on the same assignment?

set.seed(1)

dat <- data.frame(
  matrix(
    sample(1:3,5*12, replace=TRUE),12,5,
    dimnames=list(1:12,c("X1","X2","X3","X4","X5"))
  ),
  Sex=rep(c("Male", "Female")))

CodePudding user response:

We can paste columns with a specific sep, e.g.,

> do.call(paste, c(dat, sep = ", "))
 [1] "1, 1, 1, 2, 2, Male"   "3, 1, 2, 2, 3, Female" "1, 2, 1, 2, 3, Male"
 [4] "2, 2, 1, 3, 2, Female" "1, 2, 2, 2, 2, Male"   "3, 2, 2, 1, 2, Female"
 [7] "3, 3, 2, 3, 2, Male"   "2, 1, 1, 2, 1, Female" "2, 3, 3, 1, 2, Male"
[10] "3, 1, 1, 1, 2, Female" "3, 1, 3, 3, 2, Male"   "1, 1, 2, 2, 2, Female"

CodePudding user response:

Paste all columns, convert to factor, then convert to integer:

dat$id <- as.integer(factor(apply(dat, 1, paste, collapse = "_")))

CodePudding user response:

Using tidyr::unite():

library(tidyr)

unite(dat, "X", everything(), sep = " ")
                  X
1    1 1 1 2 2 Male
2  3 1 2 2 3 Female
3    1 2 1 2 3 Male
4  2 2 1 3 2 Female
5    1 2 2 2 2 Male
6  3 2 2 1 2 Female
7    3 3 2 3 2 Male
8  2 1 1 2 1 Female
9    2 3 3 1 2 Male
10 3 1 1 1 2 Female
11   3 1 3 3 2 Male
12 1 1 2 2 2 Female

CodePudding user response:

An option with exec and str_c

library(purrr)
library(stringr)
library(dplyr)
dat %>%
   transmute(X= exec(str_c, !!! rlang::syms(names(.)), sep = " "))

-output

               X
1    1 1 1 2 2 Male
2  3 1 2 2 3 Female
3    1 2 1 2 3 Male
4  2 2 1 3 2 Female
5    1 2 2 2 2 Male
6  3 2 2 1 2 Female
7    3 3 2 3 2 Male
8  2 1 1 2 1 Female
9    2 3 3 1 2 Male
10 3 1 1 1 2 Female
11   3 1 3 3 2 Male
12 1 1 2 2 2 Female
  •  Tags:  
  • r
  • Related