Home > Back-end >  Divide DataFrame by its lag Error in FUN(left, right) : non-numeric argument to binary operator
Divide DataFrame by its lag Error in FUN(left, right) : non-numeric argument to binary operator

Time:04-14

I have a dataframe with some asset prices. I would like to get the returns by dividing each column by its lag.

I have the following code:

df=structure(list(A = structure(c(19060, 19061, 19062, 19065, 19066, 
                                  19067, 19068, 19069, 19072, 19073), class = "Date"), B = list(
                                    8359.87382308, 8361.93190783, 8364.50494403, 8368.10699178, 
                                    8372.3097342, 8376.12364242, 8380.18626964, 8384.41865593, 
                                    8387.38878791, 8390.92762529), C = list(48216.51809143, 48205.4923213, 
                                                                            48141.39976957, 48018.63247926, 48051.45532043, 48091.22412151, 
                                                                            48331.57538464, 48445.10388053, 48469.49369979, 48416.3347324), 
                  D = list(17.34859405, 17.34882281, 17.34760762, 17.35130834, 
                           17.36756919, 17.37318196, 17.39286354, 17.40391027, 17.40949275, 
                           17.41696558), E = list(43.16646812, 43.17551867, 43.16216113, 
                                                  43.17305155, 43.25924615, 43.33736433, 43.40667531, 43.45909204, 
                                                  43.55074062, 43.60648949), F = list(25.83271054, 25.84295702, 
                                                                                      25.83724976, 25.85374492, 25.94544031, 26.02025781, 26.07185775, 
                                                                                      26.1007222, 26.18709194, 26.23574844), G = list(28.510282, 
                                                                                                                                      28.525803, 28.539868, 28.55125, 28.563602, 28.576996, 
                                                                                                                                      28.596242, 28.613565, 28.629501, 28.646878), H = list(
                                                                                                                                        11.88687581, 11.89326214, 11.89119058, 11.88914284, 11.8977977, 
                                                                                                                                        11.90032756, 11.90797575, 11.91538244, 11.91918169, 11.92787376), 
                  I = list(29.45064815, 29.46287037, 29.47530093, 29.4877662, 
                           29.49959491, 29.5124632, 29.526301, 29.5401692, 29.5535557, 
                           29.5686575), J = list(15.61856767, 15.62738944, 15.63486662, 
                                                 15.61386311, 15.64214889, 15.64560406, 15.69008278, 15.73562823, 
                                                 15.7482602, 15.78539406)), row.names = c(NA, -10L), class = c("tbl_df", 
                                                                                                               "tbl", "data.frame"))


df2=mutate(df[,c(2:ncol(df))]/lag(df[,c(2:ncol(df))]))

However I get the error: Error in FUN(left, right) : non-numeric argument to binary operator

I have no idea why this is giving me an issue.

CodePudding user response:

The columns in the dput are all list columns. We can unnest them first and loop over the numeric columns (across) and divide by the lag.

library(dplyr)
library(tidyr)
df2 <- df %>% 
    unnest(where(is.list)) %>% 
    mutate(across(where(is.numeric), ~ .x/lag(.x)))

-output

df2
# A tibble: 10 × 10
   A              B      C     D     E     F     G     H     I      J
   <date>     <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
 1 2022-03-09 NA    NA     NA    NA    NA    NA    NA    NA    NA    
 2 2022-03-10  1.00  1.00   1.00  1.00  1.00  1.00  1.00  1.00  1.00 
 3 2022-03-11  1.00  0.999  1.00  1.00  1.00  1.00  1.00  1.00  1.00 
 4 2022-03-14  1.00  0.997  1.00  1.00  1.00  1.00  1.00  1.00  0.999
 5 2022-03-15  1.00  1.00   1.00  1.00  1.00  1.00  1.00  1.00  1.00 
 6 2022-03-16  1.00  1.00   1.00  1.00  1.00  1.00  1.00  1.00  1.00 
 7 2022-03-17  1.00  1.00   1.00  1.00  1.00  1.00  1.00  1.00  1.00 
 8 2022-03-18  1.00  1.00   1.00  1.00  1.00  1.00  1.00  1.00  1.00 
 9 2022-03-21  1.00  1.00   1.00  1.00  1.00  1.00  1.00  1.00  1.00 
10 2022-03-22  1.00  0.999  1.00  1.00  1.00  1.00  1.00  1.00  1.00 

It is possible to divide the data with more than one column at once with lag, but then we don't need mutate i.e

df2 <- unnest(df, where(is.list))
df2[-1] <- df2[-1]/lag(df2[-1])
  • Related