Simple operation I would like to do which is proving not to be so simple. So I have a time series data set, and I would like to perform row wise normalization, so for each observation, (x- mean(row))/stdev(row)
.
This was one attempt but to no avail, and also I've replaced NA values with 0 so that doesn't seem to be the issue.
norm <- for (i in 1:nrow(clusterdatairaq2)){
for(j in 2:ncol(clusterdatairaq2)) {
clusterdatairaq2[i,j] <- (clusterdatairaq2[i,j] - mean(clusterdatairaq2[i,]))/ sd(clusterdatairaq2[i,])
}
}
Thanks in advance for any help!!
CodePudding user response:
set.seed(42)
mtx <- matrix(sample(99, size=6*5, replace=TRUE), nrow=6)
df <- cbind(data.frame(id = letters[1:6]), mtx)
df
# id A B C D E
# 1 a 49 47 26 95 58
# 2 b 65 24 3 5 97
# 3 c 25 71 41 84 42
# 4 d 74 89 89 34 24
# 5 e 18 37 27 92 30
# 6 f 49 20 36 3 43
out <- t(apply(df[,-1], 1, function(X) (X-mean(X)) / sd(X)))
colnames(out) <- paste0(colnames(df[,-1]), "_norm")
df <- cbind(df, out)
df
# id A B C D E A_norm B_norm C_norm D_norm E_norm
# 1 a 49 47 26 95 58 -0.2376354 -0.3168472 -1.1485711 1.5842361 0.1188177
# 2 b 65 24 3 5 97 0.6393668 -0.3611690 -0.8736386 -0.8248320 1.4202728
# 3 c 25 71 41 84 42 -1.1427812 0.7618541 -0.4802994 1.3001207 -0.4388942
# 4 d 74 89 89 34 24 0.3878036 0.8725581 0.8725581 -0.9048751 -1.2280448
# 5 e 18 37 27 92 30 -0.7749098 -0.1291516 -0.4690243 1.7401483 -0.3670625
# 6 f 49 20 36 3 43 1.0067737 -0.5462283 0.3106004 -1.4566088 0.6854630
CodePudding user response:
Assuming we have a data frame like this:
library(dplyr)
df = tibble(
Destination = c("Belgium", "Bulgaria", "Czechia"),
`Jan 2008` = sample(1:1000, size=3),
`Feb 2008` = sample(1:1000, size=3),
`Mar 2008` = sample(1:1000, size=3)
)
df
# A tibble: 3 × 4
Destination `Jan 2008` `Feb 2008` `Mar 2008`
<chr> <int> <int> <int>
1 Belgium 811 299 31
2 Bulgaria 454 922 421
3 Czechia 638 709 940
The tidyverse way to do this (which I think is better than base R here)
library(dplyr)
library(tidyr)
scaled = df %>%
pivot_longer(`Jan 2008`:`Mar 2008`) %>%
group_by(Destination) %>%
mutate(value = as.numeric(scale(value))) %>%
ungroup()
scaled
Destination name value
<chr> <chr> <dbl>
1 Belgium Jan 2008 1.09
2 Belgium Feb 2008 -0.205
3 Belgium Mar 2008 -0.881
4 Bulgaria Jan 2008 -0.517
5 Bulgaria Feb 2008 1.15
6 Bulgaria Mar 2008 -0.635
7 Czechia Jan 2008 -0.787
8 Czechia Feb 2008 -0.338
9 Czechia Mar 2008 1.13
Now, you could pivot it back to the original form, but there's not much point, because analysis will be much easier in long form:
scaled %>% pivot_wider(names_from=name, values_from=value)
# A tibble: 3 × 4
Destination `Jan 2008` `Feb 2008` `Mar 2008`
<chr> <dbl> <dbl> <dbl>
1 Belgium 1.09 -0.205 -0.881
2 Bulgaria -0.517 1.15 -0.635
3 Czechia -0.787 -0.338 1.13
CodePudding user response:
I used the mtcars dataset as an exemple :
library(tidyverse)
mtcars %>% #the dataset
select(disp) %>% #disp is the row that we want to normalize just as an exemple
mutate(disp2=(disp-mean(disp))/sd(disp)) #disp2 is the name of the now normalized row
CodePudding user response:
A dplyr
solution, re-using @Migwell toy example (please provide a reproducible example in your question):
library(dplyr)
df = data.table(
Destination = c("Belgium", "Bulgaria", "Czechia"),
`Jan 2008` = sample(1:1000, size=3),
`Feb 2008` = sample(1:1000, size=3),
`Mar 2008` = sample(1:1000, size=3))
> df
Destination Jan 2008 Feb 2008 Mar 2008
1: Belgium 443 114 628
2: Bulgaria 755 801 493
3: Czechia 123 512 517
You can use:
df2 <- df %>% select(`Jan 2008`:`Mar 2008`) %>% mutate(normJan2008=(`Jan 2008`-rowMeans(.,na.rm=T))/apply(.,1,sd))
> df2
Jan 2008 Feb 2008 Mar 2008 normJan2008
1: 443 114 628 0.1843742
2: 755 801 493 0.4333577
3: 123 512 517 -1.1546299
And do this for every variable you need to normalize.