Hello I have a data frame with 20 columns but here is a reproducible copy:
test_df <- data.frame(a = sample(1:20,7), b = sample(1:50,7), c= sample(1:29,7) )
max_values <- c(20,50,29)
I want to normalize each column with the corresponding index of its "max_values", please do not assume each column's max value is going to be equal to the max value I want that column to be normalized as. It is okay if it goes above 1 and below zero. The max values are the thresholds and I would like the observe how the data I have goes beyond or below it. We can assume that the min values are ALWAYS going to be 0, so I took them away from the equation:
normalize <- function(x,y) {
return ((x - 0) / (y - 0))
}
lapply(test_df, normalize)
I have written the code above, but I do not know how to set it so that each iteration corresponds to a different index of "max_values"
CodePudding user response:
You might use scale
scale(test_df, center = FALSE, scale = max_values)
# a b c
#[1,] 0.85 0.98 0.4827586
#[2,] 0.25 0.94 0.6896552
#[3,] 0.05 0.48 0.8965517
#[4,] 0.50 0.14 0.6206897
#[5,] 0.20 0.72 0.5172414
#[6,] 0.10 0.50 0.1034483
#[7,] 1.00 0.74 0.3103448
#attr(,"scaled:scale")
#[1] 20 50 29
Or divide by a list
test_df / as.list(max_values)
data
set.seed(42)
test_df <- data.frame(a = sample(1:20, 7),
b = sample(1:50, 7),
c = sample(1:29, 7))
CodePudding user response:
Try this:
t(apply(test_df,1,function(x) x/max_values))
a b c
[1,] 0.40 0.74 0.7586207
[2,] 0.65 0.40 0.6206897
[3,] 0.50 0.70 0.2413793
[4,] 0.60 1.00 0.9310345
[5,] 0.10 0.04 0.6551724
[6,] 0.95 0.80 0.8275862
[7,] 0.20 0.66 0.1034483
As long as max_values
and test_df
have the columns in the same order, you just need to go row by row. Annoyingly apply
give you the result with rows and cols switched. t
switches them back.
CodePudding user response:
Use mapply
if you have more than one parameter in your function:
mapply(normalize, test_df, max_values)