Home > other >  Kernel regression with 2 independent variables
Kernel regression with 2 independent variables

Time:10-09

For Nadaraya–Watson kernel regression estimate, we use the following in R:
ksmooth(x, y, kernel = c("box", "normal"), bandwidth = 0.5, range.x = range(x), n.points = max(100L, length(x)), x.points)

What am I supposed to use when I have two independent variables instead of a solo x; say, x1,x2. How do I change range.x,n.points,x.points?

CodePudding user response:

You can perform kernel regression with multiple independent variables using npreg from the np package:

library(np)

mod <- npreg(mpg ~ wt   hp, data = mtcars)                

mod
#> 
#> Regression Data: 32 training points, in 2 variable(s)
#>                      wt       hp
#> Bandwidth(s): 0.2401575 16.63531
#> 
#> Kernel Regression Estimator: Local-Constant
#> Bandwidth Type: Fixed
#> 
#> Continuous Kernel Type: Second-Order Gaussian
#> No. Continuous Explanatory Vars.: 2

Just as kernel regression with a single independent variable can be represented as a curved line in 2D space, with two independent variables we have a curved surface in 3D space. We can show the result of this graphically by predicting over a grid of x, y locations. Here is an example showing a surface created by kernel regression that estimates the value of mpg given wt and hp for the mtcars data set:

library(ggplot2)

newdat <- expand.grid(hp = seq(50, 350, 0.1), wt = seq(1, 5.5, 0.02))
newdat$mpg <- predict(mod, newdata = newdat)

ggplot(newdat, aes(wt, hp, fill = mpg))  
  geom_tile()  
  geom_point(data = mtcars, shape = 21, size = 3)  
  scale_fill_viridis_c()

enter image description here

Created on 2022-10-08 with reprex v2.0.2

  • Related