I need to Normalize some data values (e.g., subtracting from one A-value to another B-value), according to two factors. Assume a data frame like this one:
mydf <- data.frame("countries" = c(rep("USA",3), rep("China",3), rep("France",3)), "Type" = rep(c("Grass","Cow", "Bunny"), 3), "W1" = rnorm(9, 20, 10), "W2" = rnorm(9, 60, 10))
For each Country, I would like to subtract the values of both W1 and W2 of grass from Cows and Bunnies, for example: W1_Norm_Cows = W1_Cows - W1_Grass
.
I tried to create a pipe using dplyr
, but I got stuck.
normalized_mydf <- mydf %>%
group_by(Country) %>%
mutate(W1_norm = W1-??) #How can I specify which values to use?
mutate(W2_norm = W2-??) #How can I specify which values to use?
I tried to use group_by
twice, but of course, the script is wrong (or not complete) because it subtracts the W1 value to itself, giving zero to all the observations.
CodePudding user response:
I suggest reshaping the dataframe in this way:
mydf %>%
gather("key", "value", -c(countries, Type)) %>%
spread(Type, value) %>%
mutate(Norm_cows = Cow - Grass,
Norm_bunny = Bunny - Grass)
countries key Bunny Cow Grass Norm_cows Norm_bunny
1 China W1 25.347062 20.003938 19.71476 0.2891776 5.632301
2 China W2 69.627360 49.867221 61.53735 -11.6701307 8.090008
3 France W1 9.150308 18.257695 26.26750 -8.0098078 -17.117195
4 France W2 58.396270 65.252741 76.46905 -11.2163100 -18.072781
5 USA W1 18.803997 6.361533 30.94399 -24.5824534 -12.139989
6 USA W2 77.703504 69.438751 60.23095 9.2078025 17.472556
And pivot a bit more to get into what seems to be the desired shape:
mydf %>%
select(-c(Bunny, Cow, Grass)) %>%
gather('new_key', 'value', c(Norm_cows, Norm_bunny)) %>%
mutate(key = paste0(key, "_", new_key)) %>%
select(-new_key) %>%
spread(key, value)
countries W1_Norm_bunny W1_Norm_cows W2_Norm_bunny W2_Norm_cows
1 China 5.632301 0.2891776 8.090008 -11.670131
2 France -17.117195 -8.0098078 -18.072781 -11.216310
3 USA -12.139989 -24.5824534 17.472556 9.207803