Home > Net >  Fill up the missing column within group
Fill up the missing column within group

Time:10-04

I received a excel file like this:

structure(list(`Variable Name` = c("form.hhConsent", NA, NA, 
"form.hhReasonRefused", NA, NA, NA, "form.childrenRepeat.childrenGroup.childSex", 
NA, NA, "form.childrenRepeat.childrenGroup.dobSourceWho", NA, 
NA, NA, NA, "form.hohSex", NA, NA, "form.mothersRepeat.mothersGroup.spouseSex", 
NA, NA, "form.childrenRepeat.childrenGroup.treatmentConsent", 
NA, NA, NA, NA, "form.childrenRepeat.childrenGroup.noTreatReason", 
NA, NA, NA), Type = c(NA, "1", "0", NA, "1", "2", "99", NA, "1", 
"2", NA, "1", "2", "3", "99", NA, "1", "2", NA, "1", "2", NA, 
"1", "0", "2", "3", NA, "1", "2", "99"), Meaning = c(NA, "Yes", 
"No", NA, "No responsible could be found", "They don't want to", 
"Other reason", NA, "Male", "Female", NA, "Mother", "Father", 
"Neighbor", "Others", NA, "Male", "Female", NA, "Male", "Female", 
NA, "Yes", "No", "AMR grappe", "M54 grappe", NA, "allergy", "parent refusal", 
"other"), Missing_values = c(NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA), Format_values = c(NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -30L), class = "data.frame")

The current file looks like:

variable name   type      meaning
X1
                xxx         xxx
                xxx         xxx     
X2
                xxx.        xxx
                xxx         xxx
                xxx         xxx
x3
                xxx         xxx 

I want to convert it as regular dataset, which looks like:

variable name   type   meaning
x1               xxx.   xxx
x1               xxx.   xxx
x2               xxx    xxx
x2               xxx.   xxx
x2               xxx.   xxx
x3               xxx.   xxx

My thought is: first group by "variable name", and then filling up the missing variable name, the last step is filter out the head row without any value in col of type and meaning. But it does not work. The issue is that there is no var can be group_by.

Could someone help to figure out how to make it in R? Thanks a lot~~!

CodePudding user response:

We use fill (from tidyr) to replace the NA elements in the 'Variable Name' with the previous non-NA, then select the first three columns, and use filter with if_all to keep only rows having all non-NA elements in the 'Type', 'Meaning' columns

library(dplyr)
library(tidyr)
df1 %>% 
  fill(`Variable Name`) %>%
  select(1:3) %>% 
  filter(if_all(Type:Meaning, complete.cases))

-output

# A tibble: 22 × 3
   `Variable Name`                                Type  Meaning                      
   <chr>                                          <chr> <chr>                        
 1 form.hhConsent                                 1     Yes                          
 2 form.hhConsent                                 0     No                           
 3 form.hhReasonRefused                           1     No responsible could be found
 4 form.hhReasonRefused                           2     They don't want to           
 5 form.hhReasonRefused                           99    Other reason                 
 6 form.childrenRepeat.childrenGroup.childSex     1     Male                         
 7 form.childrenRepeat.childrenGroup.childSex     2     Female                       
 8 form.childrenRepeat.childrenGroup.dobSourceWho 1     Mother                       
 9 form.childrenRepeat.childrenGroup.dobSourceWho 2     Father                       
10 form.childrenRepeat.childrenGroup.dobSourceWho 3     Neighbor                     
# … with 12 more rows

EDIT: Based on @Leon Samson's comments

  •  Tags:  
  • r
  • Related