I received a excel file like this:
structure(list(`Variable Name` = c("form.hhConsent", NA, NA,
"form.hhReasonRefused", NA, NA, NA, "form.childrenRepeat.childrenGroup.childSex",
NA, NA, "form.childrenRepeat.childrenGroup.dobSourceWho", NA,
NA, NA, NA, "form.hohSex", NA, NA, "form.mothersRepeat.mothersGroup.spouseSex",
NA, NA, "form.childrenRepeat.childrenGroup.treatmentConsent",
NA, NA, NA, NA, "form.childrenRepeat.childrenGroup.noTreatReason",
NA, NA, NA), Type = c(NA, "1", "0", NA, "1", "2", "99", NA, "1",
"2", NA, "1", "2", "3", "99", NA, "1", "2", NA, "1", "2", NA,
"1", "0", "2", "3", NA, "1", "2", "99"), Meaning = c(NA, "Yes",
"No", NA, "No responsible could be found", "They don't want to",
"Other reason", NA, "Male", "Female", NA, "Mother", "Father",
"Neighbor", "Others", NA, "Male", "Female", NA, "Male", "Female",
NA, "Yes", "No", "AMR grappe", "M54 grappe", NA, "allergy", "parent refusal",
"other"), Missing_values = c(NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA), Format_values = c(NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -30L), class = "data.frame")
The current file looks like:
variable name type meaning
X1
xxx xxx
xxx xxx
X2
xxx. xxx
xxx xxx
xxx xxx
x3
xxx xxx
I want to convert it as regular dataset, which looks like:
variable name type meaning
x1 xxx. xxx
x1 xxx. xxx
x2 xxx xxx
x2 xxx. xxx
x2 xxx. xxx
x3 xxx. xxx
My thought is: first group by "variable name", and then filling up the missing variable name, the last step is filter out the head row without any value in col of type and meaning. But it does not work. The issue is that there is no var can be group_by.
Could someone help to figure out how to make it in R? Thanks a lot~~!
CodePudding user response:
We use fill
(from tidyr
) to replace the NA elements in the 'Variable Name' with the previous non-NA, then select
the first three columns, and use filter
with if_all
to keep only rows having all non-NA elements in the 'Type', 'Meaning' columns
library(dplyr)
library(tidyr)
df1 %>%
fill(`Variable Name`) %>%
select(1:3) %>%
filter(if_all(Type:Meaning, complete.cases))
-output
# A tibble: 22 × 3
`Variable Name` Type Meaning
<chr> <chr> <chr>
1 form.hhConsent 1 Yes
2 form.hhConsent 0 No
3 form.hhReasonRefused 1 No responsible could be found
4 form.hhReasonRefused 2 They don't want to
5 form.hhReasonRefused 99 Other reason
6 form.childrenRepeat.childrenGroup.childSex 1 Male
7 form.childrenRepeat.childrenGroup.childSex 2 Female
8 form.childrenRepeat.childrenGroup.dobSourceWho 1 Mother
9 form.childrenRepeat.childrenGroup.dobSourceWho 2 Father
10 form.childrenRepeat.childrenGroup.dobSourceWho 3 Neighbor
# … with 12 more rows
EDIT: Based on @Leon Samson's comments