I'm working with creating new features using unit scaling approach. This is the data,
Recieve = c(237, 1781, NA, 3710, 2099)
Sales = c(2509, 25616, NaN, 19224, 6569)
Industry = c("ABC", "ABC", "ABC", "CDE", "CDE")
data = data.frame(Recieve, Sales, Industry, stringsAsFactors = FALSE)
> data
Recieve Sales Industry
1 237 2509 ABC
2 1781 25616 ABC
3 NA NaN ABC
4 3710 19224 CDE
5 2099 6569 CDE
I want to create new features like Recieve_new, Sales_new by applying the unit length scaling formula. The formula is,
unitLength = x / sqrt(sum(x^2))
For example, for an entry in Recieve = 237, and industry = "ABC", the unit length should be calculated as follows,
unitLength = 237 / sqrt((237^2) (1781^2))
unitLength = 237 / sqrt(56169 3171961)
unitLength = 237 / sqrt(3228130)
unitLength = 237 / 1796.69975232
unitLength = 0.13190851709
The calculation should include only finite data and we can exclude the infinite data during calculation. I want to implement this approach using R. The expected output is this.
Recieve Sales Industry Recive_new Sales_new
1 237 2509 ABC 0.1319085 0.09748012
2 1781 25616 ABC 0.9912619 0.99523747
3 NA NaN ABC NA NA
4 3710 19224 CDE 0.8703574 0.94627897
5 2099 6569 CDE 0.4924205 0.32335136
Can anyone help me with this?
CodePudding user response:
A tidyverse
solution:
data %>%
group_by(Industry) %>%
mutate(across(c(Recieve, Sales),
~ .x / sqrt(sum(.x^2 lead(.x)^2, na.rm = T)),
.names = "{.col}_new"))
Recieve Sales Industry Recieve_new Sales_new
<dbl> <dbl> <chr> <dbl> <dbl>
1 237 2509 ABC 0.132 0.0975
2 1781 25616 ABC 0.991 0.995
3 NA NaN ABC NA NaN
4 3710 19224 CDE 0.870 0.946
5 2099 6569 CDE 0.492 0.323