Note: Dates are formatted as DD.MM.
I have the closing prices for a number of companies (here: A, B, C) for a time frame (here: Jan 1st to Jan 5th). The df looks like this:
df1 <- data.frame(date = c("01.01.", "02.01.", "03.01.", "04.01.", "05.01."),
A = c(102, 103, 107, 120, 134),
B = c(94, 95, 100, 93, 90),
C = c(55, 53, 50, 51, 48))
The way I want to normalize the data is by using the z-score, so "z = (x – μ) / σ", meaning that for A on 01.01., this would be (102 - 113) / 13.85641 = -0.7938...
How do I apply this to all my observations? I'm guessing with the mutate
funcation in dplyr
but I can't seem to figure out how to actually do it.
CodePudding user response:
In dplyr
, I think you'll need to use something like across(c(A,B,C), ...)
.
Just to offer an alternative method using data.table
, which will update the table by reference ie. there is no need to write something like df1 <- df1 %>% ...
in this situation.
library(data.table)
setDT(df1)
cols <- c("A","B","C")
df1[, (cols) := lapply(.SD, function(x) (x - mean(x))/sd(x)), .SDcols = cols]
df1
date A B C
1: 01.01. -0.8196829 -0.1096817 1.3324198
2: 02.01. -0.7464969 0.1645225 0.5921866
3: 03.01. -0.4537530 1.5355438 -0.5181632
4: 04.01. 0.4976646 -0.3838859 -0.1480466
5: 05.01. 1.5222682 -1.2064987 -1.2583965
CodePudding user response:
Actually, no package is required at all; write a function and lapply
it over the respective columns.
z <- \(x) (x - mean(x)) / sd(x)
transform(df1, z=lapply(df1[-1], z))
# date A B C z.A z.B z.C
# 1 01.01. 102 94 55 -0.8196829 -0.1096817 1.3324198
# 2 02.01. 103 95 53 -0.7464969 0.1645225 0.5921866
# 3 03.01. 107 100 50 -0.4537530 1.5355438 -0.5181632
# 4 04.01. 120 93 51 0.4976646 -0.3838859 -0.1480466
# 5 05.01. 134 90 48 1.5222682 -1.2064987 -1.2583965