Home > Back-end >  Repeatedly subtract one column from another with conditions on first column
Repeatedly subtract one column from another with conditions on first column

Time:10-05

Im trying to write a function to subtract the value of one column from the previous one.


This is the set-up of my dataset:

  lactation wt_jan wt_feb wt_mar wt_apr wt_may wt_jun wt_jul wt_aug wt_sep wt_oct wt_nov wt_dec
1         1  600.0  612.5  625.0  637.5  643.8  650.0  656.3  662.5  668.8  675.0  681.3  687.5
2         2  693.8  700.0  706.3  712.5  715.6  718.8  721.9  725.0  728.1  731.3  734.4  737.5
3         3  740.6  743.8  746.9  750.0  750.0  750.0  750.0  750.0  750.0  750.0  750.0  750.0
4         4  750.0  750.0  750.0  750.0  750.0  750.0  750.0  750.0  750.0  750.0  750.0  750.0
5         5  750.0  750.0  750.0  750.0  750.0  750.0  750.0  750.0  750.0  750.0  750.0  750.0
6         6  750.0  750.0  750.0  750.0  750.0  750.0  750.0  750.0  750.0  750.0  750.0  750.0

I previously had it set up to be the transpose, then do the subtration using this code:

calc_growth <- function(wt, days){
  growth = (wt - lag(wt , default = first(wt)))/days
  return(growth)
}

But this set-up wasn't working within my bigger project and for the 'jan' month, I need it to subtract from 'dec' of the previous lactation, not just itself.

In the end I'm looking to have an output that looks just like the input data, but with another set of columns named month_growth (e.g. jan_growth).

I tried following this one, but without much luck: subtract value from previous row by group

CodePudding user response:

I hope I understood it correctly, but here is an attempt, using pivot_longer and pivot_wider from tidyr:

library(dplyr)
library(tidyr)
# create dataset: 
lact_data <- structure(list(
  lactation = 1:6, 
  wt_jan = c(600, 693.8, 740.6, 750, 750, 750), 
  wt_feb = c(612.5, 700, 743.8, 750, 750, 750), 
  wt_mar = c(625, 706.3, 746.9, 750, 750, 750), 
  wt_apr = c(637.5, 712.5, 750, 750, 750, 750), 
  wt_may = c(643.8, 715.6, 750, 750, 750, 750), 
  wt_jun = c(650, 718.8, 750, 750, 750, 750), 
  wt_jul = c(656.3, 721.9, 750, 750, 750, 750), 
  wt_aug = c(662.5, 725, 750, 750, 750, 750), 
  wt_sep = c(668.8, 728.1, 750, 750, 750, 750), 
  wt_oct = c(675, 731.3, 750, 750, 750, 750), 
  wt_nov = c(681.3, 734.4, 750, 750, 750, 750), 
  wt_dec = c(687.5, 737.5, 750, 750, 750, 750)
  ), 
  class = "data.frame", row.names = c(NA, -6L))

long_dat <- lact_data %>% 
  pivot_longer(-lactation, names_to = c(NA, "month"), 
               values_to = "wt", names_sep = "_") %>% 
  mutate(growth = wt-lag(wt,  default = first(wt)))

pivot_wider(long_dat, names_from = c("month"), values_from = c("wt", "growth"))

output:

# A tibble: 6 × 25
# lactation wt_jan wt_feb wt_mar wt_apr wt_may wt_jun wt_jul wt_aug wt_sep wt_oct wt_nov wt_dec growth_…¹ growt…² growt…³ growt…⁴ growt…⁵ growt…⁶ growt…⁷
# <int>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#   1         1   600    612.   625    638.   644.   650    656.   662.   669.   675    681.   688.      0      12.5    12.5    12.5     6.30    6.20    6.30
# 2         2   694.   700    706.   712.   716.   719.   722.   725    728.   731.   734.   738.      6.30    6.20    6.30    6.20    3.10    3.20    3.10
# 3         3   741.   744.   747.   750    750    750    750    750    750    750    750    750       3.10    3.20    3.10    3.10    0       0       0   
# 4         4   750    750    750    750    750    750    750    750    750    750    750    750       0       0       0       0       0       0       0   
# 5         5   750    750    750    750    750    750    750    750    750    750    750    750       0       0       0       0       0       0       0   
# 6         6   750    750    750    750    750    750    750    750    750    750    750    750       0       0       0       0       0       0       0   
# # … with 5 more variables: growth_aug <dbl>, growth_sep <dbl>, growth_oct <dbl>, growth_nov <dbl>, growth_dec <dbl>, and abbreviated variable names
# #   ¹​growth_jan, ²​growth_feb, ³​growth_mar, ⁴​growth_apr, ⁵​growth_may, ⁶​growth_jun, ⁷​growth_jul
# # ℹ Use `colnames()` to see all variable names

If you need more control of the arrangement in the dataset, you can explicitely arrange the data first:

long_dat <- lact_data %>% 
  pivot_longer(-lactation, names_to = c(NA, "month"), values_to = "wt", names_sep = "_", ) %>% 
   mutate(
     month = factor(month, levels = substr(tolower(month.name), 1, 3))
     ) %>%
   arrange(lactation, month) %>%
  mutate(growth = wt-lag(wt,  default = first(wt)))
  • Related