Implement unit scaling per industry in R-CodePudding

I'm working with creating new features using unit scaling approach. This is the data,

Recieve = c(237, 1781, NA, 3710, 2099)
Sales = c(2509, 25616, NaN, 19224, 6569)
Industry = c("ABC", "ABC", "ABC",  "CDE", "CDE")
data = data.frame(Recieve, Sales, Industry, stringsAsFactors = FALSE)

> data
  Recieve Sales Industry
1     237  2509      ABC
2    1781 25616      ABC
3      NA   NaN      ABC
4    3710 19224      CDE
5    2099  6569      CDE

I want to create new features like Recieve_new, Sales_new by applying the unit length scaling formula. The formula is,

unitLength = x / sqrt(sum(x^2))

For example, for an entry in Recieve = 237, and industry = "ABC", the unit length should be calculated as follows,

unitLength = 237 / sqrt((237^2)   (1781^2))
unitLength = 237 / sqrt(56169   3171961)
unitLength = 237 / sqrt(3228130)
unitLength = 237 / 1796.69975232
unitLength = 0.13190851709

The calculation should include only finite data and we can exclude the infinite data during calculation. I want to implement this approach using R. The expected output is this.

  Recieve Sales Industry Recive_new  Sales_new
1     237  2509      ABC  0.1319085 0.09748012
2    1781 25616      ABC  0.9912619 0.99523747
3      NA   NaN      ABC         NA         NA
4    3710 19224      CDE  0.8703574 0.94627897
5    2099  6569      CDE  0.4924205 0.32335136

Can anyone help me with this?

CodePudding user response：

A tidyverse solution:

data %>% 
  group_by(Industry) %>% 
  mutate(across(c(Recieve, Sales), 
                ~ .x / sqrt(sum(.x^2   lead(.x)^2, na.rm = T)), 
                .names = "{.col}_new"))

  Recieve Sales Industry Recieve_new Sales_new
    <dbl> <dbl> <chr>          <dbl>     <dbl>
1     237  2509 ABC            0.132    0.0975
2    1781 25616 ABC            0.991    0.995 
3      NA   NaN ABC           NA      NaN     
4    3710 19224 CDE            0.870    0.946 
5    2099  6569 CDE            0.492    0.323