I want to reshape the original dataframe into the target dataframe as follows.
But first, to recreate dataframes:
original <- data.frame(caseid = c("id101", 'id201', 'id202', 'id301', 'id302'),
age_child1 = c('3', '5', '8', NA, NA),
age_child2 = c('1', '7', NA, NA, NA),
age_child3 = c('2', '6', '8', '3', NA))
target <- data.frame(caseid = c('id101_1', 'id101_2', 'id101_3', 'id201_1', 'id201_2', 'id201_3', 'id202_1', 'id202_3', 'id301_3'),
age = c(3, 1, 2, 5, 7, 6, 8, 8, 3))
The caseid column represents mothers. I want to create a new caseid row per each of the children and add the respective 'age' value to the age column. If no 'age' value is available, it means there is not an n child and no new row should be created.
Thanks for the help!
CodePudding user response:
You can use pivot_longer()
and its various helpful options:
pivot_longer(original, cols = starts_with("age"), names_prefix = "age_child",values_to = "age",values_transform = as.integer) %>%
filter(!is.na(age)) %>%
mutate(caseid = paste0(caseid,"_",name)) %>%
select(-name)
Output:
# A tibble: 9 × 2
caseid age
<chr> <int>
1 id101_1 3
2 id101_2 1
3 id101_3 2
4 id201_1 5
5 id201_2 7
6 id201_3 6
7 id202_1 8
8 id202_3 8
9 id301_3 3
CodePudding user response:
original %>%
pivot_longer(-caseid, names_to = 'child', names_pattern = '([0-9] $)',
values_to = 'age', values_drop_na = TRUE)%>%
unite(caseid, caseid, child)
# A tibble: 9 x 2
caseid age
<chr> <chr>
1 id101_1 3
2 id101_2 1
3 id101_3 2
4 id201_1 5
5 id201_2 7
6 id201_3 6
7 id202_1 8
8 id202_3 8
9 id301_3 3
CodePudding user response:
Using reshape
form base r ,
original <- data.frame(caseid = c("id101", 'id201', 'id202', 'id301', 'id302'),
age_child1 = c('3', '5', '8', NA, NA),
age_child2 = c('1', '7', NA, NA, NA),
age_child3 = c('2', '6', '8', '3', NA))
a <- reshape(original , varying = c("age_child1" , "age_child2" , "age_child3") ,
direction = "long" ,
times = c("_1" , "_2" , "_3") ,
v.names = "age")
a$caseid <- paste0(a$caseid , a$time)
a <- a[order(a$caseid) , ][c("caseid" , "age")]
a <- na.omit(a)
row.names(a) <- NULL
a
#> caseid age
#> 1 id101_1 3
#> 2 id101_2 1
#> 3 id101_3 2
#> 4 id201_1 5
#> 5 id201_2 7
#> 6 id201_3 6
#> 7 id202_1 8
#> 8 id202_3 8
#> 9 id301_3 3
Created on 2022-06-01 by the reprex package (v2.0.1)