BACKGROUND: We asked each participant to identify multiple emotions and then collected data on each of those emotions, such that there is a column for the first emotion they identified, second, and so on, and then individual columns for each of the follow up questions for each of the emotions. In wide format, it looks something like this:
rows <- 1:4
cols <- c("PID", "Stage", "Emo1_", "Emo2_",
"Emo1_Intense", "Emo2_Intense",
"Emo1_Desc", "Emo2_Desc", "Keyword")
df <- data.frame(matrix(NA,
nrow = length(rows),
ncol = length(cols),
dimnames = list(rows, cols)))
df$PID <- c("A-001", "A-002", "A-003", "A-004")
df$Stage <- c("Beginning", "End", "Middle", "Middle")
df$Emo1_ <- c("Fear", "Sadness", "Happy", "Anger")
df$Emo2_ <- c("Content", "Depressed", "Lost", "Sad")
df$Emo1_Intense <- 5:8
df$Emo2_Intense <- 1:4
df$Emo1_Desc <- c("E", "F", "G", "H")
df$Emo2_Desc <- c("A", "B", "C", "D")
df$Keyword <- c("Bus", "Ceiling", "Chainsaw", "Floor")
# PID Stage Emo1_ Emo2_ Emo1_Intense Emo2_Intense Emo1_Desc Emo2_Desc Keyword
#1 A-001 Beginning Fear Content 5 1 E A Bus
#2 A-002 End Sadness Depressed 6 2 F B Ceiling
#3 A-003 Middle Happy Lost 7 3 G C Chainsaw
#4 A-004 Middle Anger Sad 8 4 H D Floor
PROBLEM: I'm having a brain fart and cannot figure out how to convert this dataframe to the format below, wherein we have single columns that each capture: 1.) Which position an emotion was named, 2.) Which emotion was named, 3.) Each of the follow up questions for each emotion:
rows <- 1:8
cols <- c("PID", "Stage", "Number", "Emo", "Intense", "Desc", "Keyword")
df <- data.frame(matrix(NA,
nrow = length(rows),
ncol = length(cols),
dimnames = list(rows, cols)))
df$PID <- sort(rep(c("A-001", "A-002", "A-003", "A-004"), 2))
df$Stage <- sort(rep(c("Beginning", "Middle", "Middle", "End"), 2))
df$Number <- rep(1:2, 4)
df$Emo <- c("Fear", "Content", "Sadness", "Depressed", "Happy", "Lost", "Anger", "Sad")
df$Intense <- c(5,1,6,2,7,3,4,8)
df$Desc <- c("E", "A", "F", "B", "G", "C", "H", "D")
df$Keyword <- rep(c("Bus", "Ceiling", "Chainsaw", "Floor"),2)
# PID Stage Number Emo Intense Desc Keyword
#1 A-001 Beginning 1 Fear 5 E Bus
#2 A-001 Beginning 2 Content 1 A Ceiling
#3 A-002 End 1 Sadness 6 F Chainsaw
#4 A-002 End 2 Depressed 2 B Floor
#5 A-003 Middle 1 Happy 7 G Bus
#6 A-003 Middle 2 Lost 3 C Ceiling
#7 A-004 Middle 1 Anger 4 H Chainsaw
#8 A-004 Middle 2 Sad 8 D Floor
I could do this manually, but it's a much larger dataset than this and I know I've used pivot_longer a thousand times to do this simply; I'm just struggling hard to make it work right now. It's either too conservative or too liberal in its column consolidation and I'm having trouble finding the balance.
Naming conventions are arbitrary. If reformatting in another way might work better, please be my guest!
CodePudding user response:
How about this solution?
Needed to update the names of the columns that ended with "_" and some polishing of the number's column. I'm sure this could be accomplished in a single line.
#rename columns that end with _
torename<-grep("(Emo._)$", names(df))
names(df)[torename] <- paste0(names(df)[torename], "Emo")
answer<- pivot_longer(df, cols= starts_with("Emo"), names_to=c( "Number", ".value"),
names_sep = "_", names_repair="unique")
#clean-up the Number column
answer$Number <- gsub("Emo", "", answer$Number)
answer
# A tibble: 8 × 7
PID Stage Keyword Number Emo Intense Desc
<chr> <chr> <chr> <chr> <chr> <int> <chr>
1 A-001 Beginning Bus 1 Fear 5 E
2 A-001 Beginning Bus 2 Content 1 A
3 A-002 End Ceiling 1 Sadness 6 F
4 A-002 End Ceiling 2 Depressed 2 B
5 A-003 Middle Chainsaw 1 Happy 7 G
6 A-003 Middle Chainsaw 2 Lost 3 C
7 A-004 Middle Floor 1 Anger 8 H
8 A-004 Middle Floor 2 Sad 4 D