I collected some data from a survey that asked respondents to rank their preferences for players' profiles:
profile1: Tom, center, pitcher
profile2: Pete, right, hitter
profile3: Clay, left, hitter
profile4: Tom, right, fielder
profile5: Pete, left, fielder
profile6: Clay, center, pitcher
However, being unfamiliar with this questionnaire development software, the responses I collected are stored as multi-byte string values like the following (for each respondent), which are then read into R:
preferences <- data.frame(pref = c("1. Pete, right, hitter\n2. Clay, center, pitcher\n3. Tom, right, fielder\n4. Tom, center, pitcher\n5. Clay, left, hitter\n6. Pete, left, fielder",
"1. Tom, right, fielder\n2. Clay, center, pitcher\n3. Pete, left, fielder\n4. Pete, right, hitter\n5. Tom, center, pitcher\n6. Clay, left, hitter",
"1. Clay, left, hitter\n2. Tom, center, pitcher\n3. Pete, right, hitter\n4. Pete, left, fielder\n5. Clay, center, pitcher\n6. Tom, right, fielder"))
I'm wondering if there is any way to map each of a respondent's ranked choices to distinct column values corresponding to players' profiles given above, kind of like one-hot-encoding (OHE), and turn the result into the following format:
df <- data.frame(profile1 = c(4, 5, 2), profile2 = c(1, 4, 3), profile3 = c(5, 6, 1), profile4 = c(3, 1, 6), profile5 = c(6, 3, 4), profile6 = c(2, 2, 5))
df
profile1 profile2 profile3 profile4 profile5 profile6
1 4 1 5 3 6 2
2 5 4 6 1 3 2
3 2 3 1 6 4 5
Any suggestions would be appreciated.
CodePudding user response:
preferences <- data.frame(pref = c("1. Pete, right, hitter\n2. Clay, center, pitcher\n3. Tom, right, fielder\n4. Tom, center, pitcher\n5. Clay, left, hitter\n6. Pete, left, fielder",
"1. Tom, right, fielder\n2. Clay, center, pitcher\n3. Pete, left, fielder\n4. Pete, right, hitter\n5. Tom, center, pitcher\n6. Clay, left, hitter",
"1. Clay, left, hitter\n2. Tom, center, pitcher\n3. Pete, right, hitter\n4. Pete, left, fielder\n5. Clay, center, pitcher\n6. Tom, right, fielder"), stringsAsFactors = F)
profiles <- c(
"Tom, center, pitcher",
"Pete, right, hitter",
"Clay, left, hitter",
"Tom, right, fielder",
"Pete, left, fielder",
"Clay, center, pitcher"
)
df <- data.frame(do.call(rbind, lapply(preferences$pref, function(x) {
match(
profiles,
str_replace_all(strsplit(x, "\\n")[[1]], "^[0-9] . ", "")
)
})))
names(df) <- paste0("profile", 1:length(profiles))
df
# profile1 profile2 profile3 profile4 profile5 profile6
# 1 4 1 5 3 6 2
# 2 5 4 6 1 3 2
# 3 2 3 1 6 4 5
CodePudding user response:
You can create a lookup table with the profiles (lookup
) and then manipulate the preferences
object like this:
# Create data frame with six columns using `strsplit`
df=setNames(as.data.frame(tstrsplit(preferences$pref, "\\n")), paste0("profile",1:6))
# pivot longer and merge with lookup, then pivot back to wide
df %>% mutate(id = row_number()) %>%
pivot_longer(starts_with("profile"),names_prefix = "profile") %>%
mutate(value = str_remove(value,"^\\d [.] ")) %>%
inner_join(lookup, by=c("value" = "text")) %>%
pivot_wider(id_cols = id, names_from=profile, values_from = name,names_sort = TRUE,names_prefix = "profile") %>%
select(-id)
Output:
profile1 profile2 profile3 profile4 profile5 profile6
<chr> <chr> <chr> <chr> <chr> <chr>
1 4 1 5 3 6 2
2 5 4 6 1 3 2
3 2 3 1 6 4 5
Input (lookup table)
structure(list(profile = c("1", "2", "3", "4", "5", "6"), text = c("Tom, center, pitcher",
"Pete, right, hitter", "Clay, left, hitter", "Tom, right, fielder",
"Pete, left, fielder", "Clay, center, pitcher")), row.names = c(NA,
-6L), class = "data.frame")
The lookup table appears like this:
profile text
1 1 Tom, center, pitcher
2 2 Pete, right, hitter
3 3 Clay, left, hitter
4 4 Tom, right, fielder
5 5 Pete, left, fielder
6 6 Clay, center, pitcher
CodePudding user response:
In Base R you will do:
First reading your profiles:
text <- "profile1: Tom, center, pitcher
profile2: Pete, right, hitter
profile3: Clay, left, hitter
profile4: Tom, right, fielder
profile5: Pete, left, fielder
profile6: Clay, center, pitcher"
a <- read.dcf(textConnection(text), all = TRUE)
Note that if your profiles are in a file, then use a <- read.dcf('file.name', all = TRUE)
b <- strsplit(gsub("\\d ..", '', preferences$pref), '\n')
setNames(data.frame(t(mapply(match, list(a), b))), names(a))
profile1 profile2 profile3 profile4 profile5 profile6
1 4 1 5 3 6 2
2 5 4 6 1 3 2
3 2 3 1 6 4 5