I have a panel data frame in which the number of children a woman has is asked. Now I would like to delete all women who DON´T have children, while maintaining women who f.e. didn´t have a child in 2016, but in 2018. Here´s part of the data frame for reference:
ID year child
1 2012 0
1 2014 0
1 2016 1
2 2012 0
2 2014 0
2 2016 0
3 2014 1
3 2016 1
4 2012 0
4 2016 1
4 2018 2
5 2018 0
5 2020 0
Can someone help me delete all women who are not mothers?
CodePudding user response:
dplyr
option:
librar(dplyr)
df %>%
group_by(ID) %>%
filter(sum(child) >= 1)
Output:
# A tibble: 8 × 3
# Groups: ID [3]
ID year child
<dbl> <dbl> <dbl>
1 1 2012 0
2 1 2014 0
3 1 2016 1
4 3 2014 1
5 3 2016 1
6 4 2012 0
7 4 2016 1
8 4 2018 2
As you can see mothers 2 and 5 do not have children.
base R
option:
df[df$ID %in% df$ID[df$child!=0], ]
Data
df <- data.frame(ID = c(1,1,1,2,2,2,3,3,4,4,4,5,5),
year = c(2012, 2014, 2016, 2012, 2014, 2016, 2014, 2016, 2012, 2016, 2018, 2018, 2020),
child = c(0,0,1,0,0,0,1,1,0,1,2,0,0))
CodePudding user response:
df <- data.frame(ID = c(1,1,1,2,2,2,3,3,4,4,4,5,5),
year = c(2012, 2014, 2016, 2012, 2014, 2016, 2014, 2016, 2012, 2016, 2018, 2018, 2020),
child = c(0,0,1,0,0,0,1,1,0,1,2,0,0))
library(data.table)
setDT(df)[, .SD[any(child > 0)], by = ID]
#> ID year child
#> 1: 1 2012 0
#> 2: 1 2014 0
#> 3: 1 2016 1
#> 4: 3 2014 1
#> 5: 3 2016 1
#> 6: 4 2012 0
#> 7: 4 2016 1
#> 8: 4 2018 2
Created on 2022-05-19 by the reprex package (v2.0.1)
Created on 2022-05-19 by the reprex package (v2.0.1)