Home > OS >  applying function to select columns in list of dataframes in r
applying function to select columns in list of dataframes in r

Time:07-13

I have a list of 1000s of dataframes.

Each one has the following structure:

structure(list(frame = c(222, 223, 224, 225, 226, 227, 228, 229, 
230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 
243, 244, 245, 246, 247, 248, 249, 250, 251, 252), room = c("B6", 
NA, NA, NA, NA, "B6", NA, NA, "B6", NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, "B6", NA, NA, NA, NA, NA, NA, "B6"
), id = c(2, NA, NA, NA, NA, 85, NA, NA, 2, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, 32, NA, NA, NA, NA, NA, NA, 
1), id_prob = c(0.710559149006359, NA, NA, NA, NA, 0.676624962451645, 
NA, NA, 0.650006199807849, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, 0.668218888964693, NA, NA, NA, NA, NA, NA, 
0.786722974412071), x = c(1606, NA, NA, NA, NA, 1319, NA, NA, 
1636, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
1316.75, NA, NA, NA, NA, NA, NA, 656.5), y = c(-472.25, NA, NA, 
NA, NA, -516.5, NA, NA, -463.5, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, -520, NA, NA, NA, NA, NA, NA, -941), 
    orientation = c(84.5596680381038, NA, NA, NA, NA, 51.3401926511951, 
    NA, NA, 71.565048727047, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, 63.4349516145757, NA, NA, NA, NA, 
    NA, NA, 120.963756691571), area = c(-133, NA, NA, NA, NA, 
    -98, NA, NA, -140, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, -130, NA, NA, NA, NA, NA, NA, -166)), row.names = c(NA, 
-31L), class = c("tbl_df", "tbl", "data.frame"))

I have the following code that fills in the gaps of NA values if the max gap is < 20 rows.

df[c('id','x','y')] <- na.locf(df[c('id','x','y')], na.rm = F, maxgap = 20)

This works completely fine on single data frames and results in the following output.

structure(list(frame = c(222, 223, 224, 225, 226, 227, 228, 229, 
230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 
243, 244, 245, 246, 247, 248, 249, 250, 251, 252), room = c("B6", 
NA, NA, NA, NA, "B6", NA, NA, "B6", NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, "B6", NA, NA, NA, NA, NA, NA, "B6"
), id = c(2, 2, 2, 2, 2, 85, 85, 85, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 32, 32, 32, 32, 32, 32, 32, 1), id_prob = c(0.710559149006359, 
NA, NA, NA, NA, 0.676624962451645, NA, NA, 0.650006199807849, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.668218888964693, 
NA, NA, NA, NA, NA, NA, 0.786722974412071), x = c(1606, 1606, 
1606, 1606, 1606, 1319, 1319, 1319, 1636, 1636, 1636, 1636, 1636, 
1636, 1636, 1636, 1636, 1636, 1636, 1636, 1636, 1636, 1636, 1316.75, 
1316.75, 1316.75, 1316.75, 1316.75, 1316.75, 1316.75, 656.5), 
    y = c(-472.25, -472.25, -472.25, -472.25, -472.25, -516.5, 
    -516.5, -516.5, -463.5, -463.5, -463.5, -463.5, -463.5, -463.5, 
    -463.5, -463.5, -463.5, -463.5, -463.5, -463.5, -463.5, -463.5, 
    -463.5, -520, -520, -520, -520, -520, -520, -520, -941), 
    orientation = c(84.5596680381038, NA, NA, NA, NA, 51.3401926511951, 
    NA, NA, 71.565048727047, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, 63.4349516145757, NA, NA, NA, NA, 
    NA, NA, 120.963756691571), area = c(-133, NA, NA, NA, NA, 
    -98, NA, NA, -140, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, -130, NA, NA, NA, NA, NA, NA, -166)), row.names = c(NA, 
-31L), class = c("tbl_df", "tbl", "data.frame"))

However, in order to keep track of which rows are 'filled in' and which ones were already present in the raw data, I only want to apply this to specific columns. I.e. it is critical that only the NA values of the 3 specified columns get filled in. All the other columns should remain as NA.

When I try to apply this code to the list (i.e. to run it on every dataframe within the list) I run this:

test <- lapply(list, function(x) na.locf(x[c('id','x','y')],na.rm = F, maxgap = 20))

Unfortunately this removes all other columns except for those 3 from the data.frame. This option fills in the gaps for every column

test <- lapply(list, function(x) na.locf(x,na.rm = F, maxgap = 20))

Is there a way to apply my original code to the entire list of dataframes?

Thanks!

CodePudding user response:

You can use the same code that you used for a single data frame:

test <- lapply(list, function(x) {
  x[c('id','x','y')] <- na.locf(x[c('id','x','y')], na.rm = F, maxgap = 20)
  x
})
  • Related