Home > Software engineering >  How to use index to filter rows in plm R dataframe?
How to use index to filter rows in plm R dataframe?

Time:09-15

I have a need to filter out (drop) rows with certain index, i.e. c("b-2022", "e-2022"), from the following example pdata_frame.

data_frame = data.frame(
  code = c("b","b","d","e","d") ,
  year = c(2021, 2022, 2021, 2022, 2022),
  values = c(0,2,1,4,5) 
)

library(plm)    
pdata_frame <- pdata.frame(data_frame, index = c("code","year"), drop.index = FALSE)

#        code year values
# b-2021    b 2021      0
# b-2022    b 2022      2
# d-2021    d 2021      1
# d-2022    d 2022      5
# e-2022    e 2022      4

Now I use a rather cumbersome way to manually code conditions without using index at all.

pdata_frame[-which(
  (pdata_frame$code == "b" & pdata_frame$year==2022) |
  (pdata_frame$code == "e" & pdata_frame$year==2022)), ]

Is there a way to make use of index for more efficient (succinct) filtering, smth like pdata_frame[-c(2, 5), ]?

CodePudding user response:

One solution to this (it is not the most efficient), is using a dplyr approach:

pdata_frame %>% 
  mutate(index = paste0(code, "-", year)) %>% 
  filter(!index %in% c("b-2022", "e-2022")) %>% 
  select(-index)

       code year values
b-2021    b 2021      0
d-2021    d 2021      1
d-2022    d 2022      5

CodePudding user response:

You can add a new column without using the plm package and filter by this column.

This code is in Rbase

d <- c("b-2022", "e-2022")
data_frame <- within(data_frame,name <- paste0(code, "-", year))
subset(data_frame, subset = !name %in% d, select = -c(name))

EDIT :

This is a single line finally

d <- c("b-2022", "e-2022")
subset(data_frame, subset = ! paste0(code, "-", year) %in% d)
  • Related