Home > Blockchain >  Reshape dataframe - create rows as per data availability in R
Reshape dataframe - create rows as per data availability in R

Time:06-01

I want to reshape the original dataframe into the target dataframe as follows. enter image description here

But first, to recreate dataframes:

original <- data.frame(caseid = c("id101", 'id201', 'id202', 'id301', 'id302'),
                      age_child1 = c('3', '5', '8', NA, NA), 
                      age_child2 = c('1', '7', NA, NA, NA),
                      age_child3 = c('2', '6', '8', '3', NA))

target <-   data.frame(caseid = c('id101_1', 'id101_2', 'id101_3', 'id201_1', 'id201_2', 'id201_3', 'id202_1', 'id202_3', 'id301_3'),
                       age    = c(3, 1, 2, 5, 7, 6, 8, 8, 3))

The caseid column represents mothers. I want to create a new caseid row per each of the children and add the respective 'age' value to the age column. If no 'age' value is available, it means there is not an n child and no new row should be created.

Thanks for the help!

CodePudding user response:

You can use pivot_longer() and its various helpful options:

pivot_longer(original, cols = starts_with("age"), names_prefix = "age_child",values_to = "age",values_transform = as.integer) %>%
  filter(!is.na(age)) %>%
  mutate(caseid = paste0(caseid,"_",name)) %>%
  select(-name)

Output:

# A tibble: 9 × 2
  caseid    age
  <chr>   <int>
1 id101_1     3
2 id101_2     1
3 id101_3     2
4 id201_1     5
5 id201_2     7
6 id201_3     6
7 id202_1     8
8 id202_3     8
9 id301_3     3

CodePudding user response:

original %>%
  pivot_longer(-caseid, names_to = 'child', names_pattern = '([0-9] $)',
               values_to = 'age', values_drop_na = TRUE)%>%
  unite(caseid, caseid, child)

# A tibble: 9 x 2
  caseid  age  
  <chr>   <chr>
1 id101_1 3    
2 id101_2 1    
3 id101_3 2    
4 id201_1 5    
5 id201_2 7    
6 id201_3 6    
7 id202_1 8    
8 id202_3 8    
9 id301_3 3  

CodePudding user response:

Using reshape form base r ,

original <- data.frame(caseid = c("id101", 'id201', 'id202', 'id301', 'id302'),
                       age_child1 = c('3', '5', '8', NA, NA), 
                       age_child2 = c('1', '7', NA, NA, NA),
                       age_child3 = c('2', '6', '8', '3', NA))

a  <- reshape(original , varying = c("age_child1" , "age_child2" , "age_child3") ,
              direction = "long" ,
              times = c("_1" , "_2" , "_3") ,
              v.names = "age")

a$caseid <- paste0(a$caseid , a$time)
a <- a[order(a$caseid) , ][c("caseid" , "age")]
a <- na.omit(a)
row.names(a) <- NULL
a
#>    caseid age
#> 1 id101_1   3
#> 2 id101_2   1
#> 3 id101_3   2
#> 4 id201_1   5
#> 5 id201_2   7
#> 6 id201_3   6
#> 7 id202_1   8
#> 8 id202_3   8
#> 9 id301_3   3

Created on 2022-06-01 by the reprex package (v2.0.1)

  • Related