Home > Mobile >  Simple linear transformation of variable in R: changing the scope of a variable. How to make it righ
Simple linear transformation of variable in R: changing the scope of a variable. How to make it righ

Time:09-01

I am trying to change the value range of a variable (array, set of values) while keeping its properties. I don't know the exact name in math, but I mean such a kind of transformation that the variable array has exactly the same properties, the spacing between the values is the same, but the range is different. Maybe the code below will explain what I mean.

I just want to "linearly transpose" (or smth?) values to some other range and the distribution should remain same. In other words - I'll just change the scope of the variable using the regression equation y = a * x b. I assume that the transformation will be completely linear, the correlation between the variables is exactly 1, and I calculate new variable (array) from a regression equation, actually a system of equations where I simply substitute the maximum ranges of both variables:

minimum.y1 = minimum.x1 * a   b
maximum.y2 = maximum.x2 * a   b

from which I can work out the following code to obtain a and b coefficients:

# this is my input variable
x <- c(-1, -0.5, 0, 0.5, 1)
# this is the range i want to obtain
y.pred <- c(1,2,3,4,5)

max_y = 5
min_y = 1

min_x = min(x)
max_x = max(x)
c1 = max_x-min_x
c2 = max_y-min_y

a.coeff = c2/c1
b.coeff = a.coeff-min_x
y = x * a.coeff   b.coeff
y
# hey, it works! :)
[1] 1 2 3 4 5

the correlation between the variable before and after the transformation is exactly 1. So we have a basis for further action. Let's get it as a function:

linscale.to.int <- function(max.lengt, vector) {
max_y = max.lengt
min_y = 1

min_x = min(vector)
max_x = max(vector)
c1 = max_x-min_x
c2 = max_y-min_y

a.coeff = c2/c1
b.coeff = a.coeff-min_x
return(vector * a.coeff   b.coeff)
}
x <- c(-1, -0.5, 0, 0.5, 1)
linscale.to.int(5,x)
[1] 1 2 3 4 5

and it works again. But here's the thing: when i aplly this function to random distribution, like this:

x.rand <- rnorm(50)
y.rand <- linscale.to.int(5,x.rand)
plot(x.rand, y.rand)

or better seable this:

x.rand <- rnorm(500)
y.rand <- linscale.to.int(20,x.rand)
plot(x.rand, y.rand)

I get the values of the second variable completely out of range; it should be between 1 and 20 but i get scope of valuest about -1 to 15:

invalid output range of the function

And now the question arises - what am I doing wrong here? Where do I go wrong with such a transformation?

CodePudding user response:

What you are trying to do is very straightforward using rescale from the scales package (which you will already have installed if you have ggplot2 / tidyverse installed). Simply give it the new minimum / maximum values:

x <- c(-1, -0.5, 0, 0.5, 1)

scales::rescale(x, c(1, 5))
#> [1] 1 2 3 4 5

If you want to have your own function written in base R, the following one-liner should do what you want:

linscale_to_int <- function(y, x) (x - min(x)) * (y - 1) / diff(range(x))   1

(Note that it is good practice in R to avoid periods in function names because this can cause confusion with S3 method dispatch)

Testing, we have:

x <- c(-1, -0.5, 0, 0.5, 1)
linscale_to_int(5, x)
#> [1] 1 2 3 4 5

x.rand <- rnorm(50)
y.rand <- linscale_to_int(5, x.rand)
plot(x.rand, y.rand)

y.rand <- linscale_to_int(20, x.rand)
plot(x.rand, y.rand)

Created on 2022-08-31 with reprex v2.0.2

  • Related