I am trying to do a function of decumulation with a for loop in R because the financial information provided by the company is accumulated for different concepts (this means that the info of January is only of January, the info of February is the sum of January and February, the one of March is the sum of January, February and March, etc.).
For example, let's say that I have the next dataframe:
Concepts <- c("Concept1", "Concept2", "Concept3")
January <- c(5,10,16)
February <- c(9,14,20)
March <- c(16,20,23)
df <- data.frame(Concepts, January, February, March)
This will give me the next dataframe:
Concepts January February March
Concept1 5 9 16
Concept2 10 14 20
Concept3 16 20 23
What I need to achieve is the next dataframe (Notice that February is the difference between February and January, and March is the difference between February and March):
Concepts January February March
Concept1 5 4 7
Concept2 10 4 6
Concept3 16 4 3
To achieve the second dataframe, I first created an empty dataframe with the same amount of rows of df, then with a for loop cbind the first two rows of the dataframe (because they do not need any manipulation) and with the index add the next ones after calculated the difference. The above in code is as follows:
df <- data.frame(Concepts, January, February, March)
df2 <- data.frame(matrix(nrow=nrow(df),ncol=ncol(df))) #Empty Dataframe with the same number of rows
for(i in 1:ncol(df)) {
if(i == 1){
df2 <- cbind(df2, df[ , i])
} else if (i == 2){
df2 <- cbind(df2, df[, i])
} else {
diference <- df[,i] - df[,i-1]
df2 <- cbind(df2,diference)
}
I get the following error:
error in
[.data.table
(df, , i) : j (the 2nd argument inside [...]) is a single symbol but column name 'i' is not found. Perhaps you intended DT[, ..i]. This difference to data.frame is deliberate and explained in FAQ 1.1.
I would love to receive a correction to my code or some alternative that allows me to calculate the above for a dataframe of many years.
CodePudding user response:
First note that if you apply
base function diff
to the months columns, you will get one column less but transposed.
apply(df[-1], 1, diff)
# [,1] [,2] [,3]
#February 4 4 4
#March 7 6 4
So transpose it to get the right orientation.
t(apply(df[-1], 1, diff))
# February March
#[1,] 4 7
#[2,] 4 6
#[3,] 4 4
And cbind
it with the first 2 columns. Since the first argument is a subset of a data.frame, the method called is cbind.data.frame
and the result is also a df.
cbind(df[1:2], t(apply(df[-1], 1, diff)))
# Concepts January February March
#1 Concept1 5 4 7
#2 Concept2 10 4 6
#3 Concept3 15 4 4
CodePudding user response:
This may not be the most elegant, but it should work. The trick is to extract the numeric portion of the data frame and apply diff—by row—to the results, transpose it, and paste it back to the initial values.
df <- data.frame(Concepts = c("Concept1", "Concept2", "Concept3"),
January = c(5,10,16),
February = c(9,14,20),
March = c(16,20,23),
April = c(20, 27, 33))
dfdiff <- apply(df[, -1L], 1L, diff)
df2 <- data.frame(Concepts = c("Concept1", "Concept2", "Concept3"),
January = c(5,10,16))
df2 <- cbind(df2, t(dfdiff))
df2
Concepts January February March April
1 Concept1 5 4 7 4
2 Concept2 10 4 6 7
3 Concept3 16 4 3 10
Now that you know how it works, for a more efficent call you could just do:
df2 <- cbind(df[, 1:2], t(apply(df[, -1L], 1L, diff)))
Which should work for any size dataframe of the structre you had above: One heading column and the remainder being cumulative data columns.
Speed comparison with tidyverse method
microbenchmark(TV = df2 <- df %>% pivot_longer(!Concepts) %>% group_by(Concepts) %>%
dplyr::mutate(value2 = value - lag(value, default = first(value))) %>%
rowwise %>%mutate(value2 = ifelse(value2 == 0, value, value2)) %>%
select(-value) %>%pivot_wider(names_from = "name", values_from = "value2"),
BASE = df2 <- cbind(df[, 1:2], t(apply(df[, -1L], 1L, diff))),
times = 1000L, control = list(order = 'block'))
Unit: microseconds
expr min lq mean median uq max neval cld
TV 11141.7 11554.05 12245.1253 11803.75 12300.25 22903.5 1000 b
BASE 160.4 164.35 176.8356 165.70 168.70 3833.2 1000 a
CodePudding user response:
A tidyverse
solution:
library(tidyverse)
df %>%
pivot_longer(!Concepts) %>%
group_by(Concepts) %>%
mutate(value2 = value - lag(value, default = first(value))) %>%
rowwise %>%
mutate(value2 = ifelse(value2 == 0, value, value2)) %>%
select(-value) %>%
pivot_wider(names_from = "name", values_from = "value2")
Output
# A tibble: 3 × 4
Concepts January February March
<chr> <dbl> <dbl> <dbl>
1 Concept1 5 4 7
2 Concept2 10 4 6
3 Concept3 16 4 3