I have a list of six data frames, from which 5/6 data frames have a column "Z". To proceed with my script, I need to remove the data frame which doesn't have column Z, so I tried the following code:
for(i in 1:length(df)){
if(!("Z" %in% colnames(df[[i]])))
{
df[[i]] = NULL
}
}
This seem'd to actually do the job (it removed the one data frame from the list, which didn't have the column Z), BUT however I still got a message "Error in df[[i]] : subscript out of bounds". Why is that, and how could I get around the error?
CodePudding user response:
The base Filter
function works well here:
df <- Filter(\(x) "Z" %in% names(x), df)
As to why your method doesn't work, for(i in 1:length(df))
iterates over each item in the original length(df)
. As soon as df[[i]] = NULL
happens once, then df
is shorter than it was when the loop started, so the last iteration will be out of bounds. And you'll also skip some items: if df[[2]]
is removed then the original df[[3]]
is now df[[2]]
, and the current df[[3]]
was originally df[[4]]
, so you hop over the original df[[3]]
without checking it. Lesson: don't change the length of objects in the midst of iterating over them.
CodePudding user response:
If df
is your list of 6 dataframes, you can do this:
df <- df[sapply(df, \(i) "Z" %in% colnames(i))]
The reason you get the error is that your loop will reduce the length of df
, such that i
will eventually be beyond the (new) length of df. There will be no error if the only frame in df
without column Z
is the last frame.
CodePudding user response:
Using discard
:
list_df <- list(df1, df2)
purrr::discard(list_df, ~any(colnames(.x) == "Z"))
Output:
[[1]]
A B
1 1 3
2 3 4
As you can see it removed the first dataframe which had column Z.
data
df1 <- data.frame(A = c(1,2),
Z = c(1,4))
df2 <- data.frame(A = c(1,3),
B = c(3,4))