Home > database >  Delete Variable only if ALWAYS 0 in Panel Data?
Delete Variable only if ALWAYS 0 in Panel Data?

Time:05-20

I have a panel data frame in which the number of children a woman has is asked. Now I would like to delete all women who DON´T have children, while maintaining women who f.e. didn´t have a child in 2016, but in 2018. Here´s part of the data frame for reference:

ID  year    child
1   2012    0
1   2014    0
1   2016    1
2   2012    0
2   2014    0
2   2016    0
3   2014    1
3   2016    1
4   2012    0
4   2016    1
4   2018    2
5   2018    0
5   2020    0

Can someone help me delete all women who are not mothers?

CodePudding user response:

dplyr option:

librar(dplyr)
df %>%
  group_by(ID) %>%
  filter(sum(child) >= 1)

Output:

# A tibble: 8 × 3
# Groups:   ID [3]
     ID  year child
  <dbl> <dbl> <dbl>
1     1  2012     0
2     1  2014     0
3     1  2016     1
4     3  2014     1
5     3  2016     1
6     4  2012     0
7     4  2016     1
8     4  2018     2

As you can see mothers 2 and 5 do not have children.

base R option:

df[df$ID %in% df$ID[df$child!=0], ]

Data

df <- data.frame(ID = c(1,1,1,2,2,2,3,3,4,4,4,5,5),
                 year = c(2012, 2014, 2016, 2012, 2014, 2016, 2014, 2016, 2012, 2016, 2018, 2018, 2020),
                 child = c(0,0,1,0,0,0,1,1,0,1,2,0,0))

CodePudding user response:

df <- data.frame(ID = c(1,1,1,2,2,2,3,3,4,4,4,5,5),
                 year = c(2012, 2014, 2016, 2012, 2014, 2016, 2014, 2016, 2012, 2016, 2018, 2018, 2020),
                 child = c(0,0,1,0,0,0,1,1,0,1,2,0,0))

library(data.table)

setDT(df)[, .SD[any(child > 0)], by = ID]
#>    ID year child
#> 1:  1 2012     0
#> 2:  1 2014     0
#> 3:  1 2016     1
#> 4:  3 2014     1
#> 5:  3 2016     1
#> 6:  4 2012     0
#> 7:  4 2016     1
#> 8:  4 2018     2

Created on 2022-05-19 by the reprex package (v2.0.1)

Created on 2022-05-19 by the reprex package (v2.0.1)

  • Related