Me and some fellow students created a qualtrics survey for the course judicial lawmaking. We worked with 4 case vignettes. Each respondent first answered some general questions and then they answered one case. They were first asked whether alimony should be granted and in a second question they were asked how much. Only the ones who answered yes saw this second question. Now we imported the data to R. Since they only answered 1 case and 3 were left open, there are a lot of missing values. I am trying to create a dataset whitout all the unanswered questions? However, i only manage to get all the yes answers. On the other hand i managed to remove the NA, but then it seems like the first question is no longer linked to the second question. (if Q7 was answered yes, the next column should be Q8, but i see the first column says Q7 and the second column says Q12 for example. I will add the code i wrote but i am a law student so my understanding of everything is rather limited. I added a simplified example. The numbers from 1 to 4 represent the 4 different cases.
age <- c("18-30","18-30","31-45", 60)
YesNo1 <- c("Yes", NA,NA,NA)
Height1 <- c(250,NA,NA,NA)
YesNo2 <- c(NA,"NO",NA,NA)
Height2 <- c(NA,NA,NA,NA)
YesNo3 <- c(NA,NA,"Yes", NA)
Height3 <- c(NA,NA,320,NA)
YesNo4 <- c(NA,NA,NA,"yes")
Height4 <- c(NA,NA,NA, 290)
Test <- data.frame(age, YesNo1, Height1, YesNo2, Height2,
YesNo3, Height3, YesNo4,Height4)
#inspect the data
Test
# reduce the columns
mi <- pivot_longer(Test, c(YesNo1, YesNo2, YesNo3, YesNo4),
names_to = "decision", values_to = "yes/no")
mi1 <- pivot_longer(mi, c(Height1, Height2, Height3, Height4),
names_to = "alimony", values_to = "height")
#drop the NA rows
mi2 <- mi1 %>% drop_na('yes/no')
In an ideal world i would like to have one dataset with the general questions followed by a column with the number of the yes or no question and the column with the answer. And then a column with the number of the question how much alimony should be granted and a column with the answer. (the numbers of the question should always matchs (7and8, 9and10...) I hope this is clear and someone can help me with it. I translated my problem to a simplified version. when one runs it in R, u can see there is 4 times Yes, and 4 times no. I only want to keep 1 yes and 1 no. But i cant delete the remaining rows with NA in since it will also delete the No answered question. Do you have any idea how i can fix it please?
CodePudding user response:
Apparently you want to use tidyr
. I am not fit with the tidyverse so I'd like to show you a approach using standard R and the stack
function. Taking your data example
Height1 <- c(250,NA,NA,NA)
YesNo2 <- c(NA,"NO",NA,NA)
Height2 <- c(NA,NA,NA,NA)
YesNo3 <- c(NA,NA,"Yes", NA)
Height3 <- c(NA,NA,320,NA)
YesNo4 <- c(NA,NA,NA,"yes")
Height4 <- c(NA,NA,NA, 290)
Test <- data.frame(age, YesNo1, Height1, YesNo2, Height2,
YesNo3, Height3, YesNo4,Height4)
we can now stack the YesNo
columns and the Height
columns on top of each other, calling the result stacked
:
stacked <- data.frame(age = Test$age,
yesno = stack(Test, select = c("YesNo1", "YesNo2", "YesNo3", "YesNo4")),
height = stack(Test, select = c("Height1", "Height2", "Height3", "Height4"))
)
If you print(stacked)
you'll see a lot of NA. So in the next (and final) step, we delete all those columns that have an NA
in the yesno
column:
stacked <- stacked[!is.na(stacked$yesno.values),]
print(stacked)
And the result is what I understood from your question to be the goal:
> print(stacked)
age yesno.values yesno.ind height.values height.ind
1 18-30 Yes YesNo1 250 Height1
6 18-30 NO YesNo2 NA Height2
11 31-45 Yes YesNo3 320 Height3
16 60 yes YesNo4 290 Height4
Sorry for this not being a tidyverse answer. At least, the No
answer was kept in the data.
CodePudding user response:
this is your solution applied to my larger dataset @bernhard
Test <- read.csv2("Data2.csv", header = TRUE, sep = ",")
#inspect the data
Test
#select data
Test1 <- Test[,11:24]
#NA invullen
Test2 <- Test1
Test2[Test2 == ""] <- NA
stacked1 <- data.frame(Q1 = Test2$Q1, Q2 = Test2$Q2, Q3 = Test2$Q3,
Q4 = Test2$Q4, Q5 = Test2$Q5, Q6 = Test2$Q6,
yesno = stack(Test2, select = c("Q7", "Q9", "Q11", "Q13")),
height = stack(Test2, select = c("Q8", "Q10", "Q12", "Q14")))
stacked1[stacked1 == ""] <- NA
stacked1 <- stacked1[!is.na(stacked$yesno.values),]
print(stacked2)
As mentionned in my comment the NA's do not dissapear, but they dont give an error either