Home > other >  How to vectorize a dataframe in a for loop
How to vectorize a dataframe in a for loop

Time:07-23

How would you vectorize this for loop in R? This is just a small example but i would need to do this for 11 columns.

for(i in 2:nrow(df1)){
  if(df1[i, 'sku2'] == ''){
    df1[i,'sku2'] <- df1[i - 1, 'sku2']
  }
}
return(df1)

CodePudding user response:

The task here, if I understand it, is to replace blanks in each column with the last non-blank of that column.

Here's a vectorized tidyverse approach:

library(tidyverse)
df1 %>%
 mutate_all(na_if,"") %>%
 fill(names(df1), .direction = "down)

This takes the df1 data frame, converts blanks to NAs, then uses tidyr::fill on every column to use the last non-blank value.

I expect this will be much faster than your loop, but if you want "the fastest possible" approach, it may be worth looking into approaches using the data.frame or collapse packages.

CodePudding user response:

You were on the right path with vectorization. This for loop can be replaced entirely by using vectorised operations. My approach was using the indexing operator [] .

df1[df1[1] == "", 1] <- df1[which(df1[1] == "") - 1, 1]

df1[1] == "" returns a boolean result. I needed to use which to convert it into a row number, in order to do -1 row.

To make it work across 11 columns and to avoid copy paste, here is a function:

rm.blank <- function(x){
  df1[df1[x] == "", x]  <- df1[which(df1[x] == "") - 1, x]
  return(df1[x])
}

I just did not figure out yet how to write a second function that applies the first function to all columns. Sooo... I created another for loop.

for (i in 1:ncol(df1)) {
  df1[i] <- rm.blank(i)
}
  • Related