Home > Software design >  Unexpected result while using lowess to smooth a data.table column in R
Unexpected result while using lowess to smooth a data.table column in R

Time:11-09

I have a data.table test_dt in which I want to smooth the y column using lowess function.

test_dt <- structure(list(x = c(28.75, 30, 31.25, 32.5, 33.75, 35, 36.25, 
37.5, 38.75, 40, 41.25, 42.5, 43.75, 45, 46.25, 47.5, 48.75, 
50, 52.5, 55, 57.5, 60, 62.5, 63.75, 65, 67.5, 70, 72.5, 75, 
77.5, 80, 82.5, 85, 87.5, 90, 92.5, 95, 97.5, 100, 102.5, 103.75, 
105, 106.25, 107.5, 108.75, 110, 111.25, 112.5, 113.75, 115, 
116.25, 117.5, 118.75, 120, 121.25, 122.5, 125, 130, 135, 140, 
145), y = c(116.78, 115.53, 114.28, 113.05, 111.78, 110.53, 109.28, 
108.05, 106.78, 105.53, 104.28, 103.025, 101.775, 100.525, 99.28, 
98.05, 96.8, 95.525, 93.1, 90.65, 88.225, 85.775, 83.35, 82.15, 
80.9, 78.5, 76.075, 73.675, 71.25, 68.85, 66.5, 64.075, 61.725, 
59.4, 57.075, 54.725, 52.475, 50.225, 48, 45.75, 44.65, 43.55, 
42.475, 41.45, 40.35, 39.275, 38.25, 37.225, 36.175, 35.175, 
34.175, 33.225, 32.275, 31.3, 30.35, 29.45, 27.625, 24.175, 21, 
18.125, 15.55), z = c(116.778248424972, 115.531456655985, 114.284502467544, 
113.034850770519, 111.784500981402, 110.533319511795, 109.284500954429, 
108.034850457264, 106.784502297216, 105.531265565238, 104.278221015846, 
103.026780249377, 101.775992395759, 100.528761292272, 99.2853168637851, 
98.043586202838, 96.8021989104315, 95.5702032427799, 93.1041279347743, 
90.6575956222915, 88.2179393348852, 85.783500434839, 83.3503011023971, 
82.136280706039, 80.922846825298, 78.4965179152157, 76.0823895453039, 
73.6686672097464, 71.264486719796, 68.8702598156142, 66.4865368523571, 
64.1182523898466, 61.7552221811808, 59.4004347738795, 57.0823289450761, 
54.7908645949795, 52.5071096685879, 50.2308279167219, 47.9940967492558, 
45.7658417529877, 44.6514226583931, 43.5622751034012, 42.4876666190815, 
41.4173110074806, 40.3555584369672, 39.3004471381618, 38.2552969838653, 
37.2202353638959, 36.1963659189447, 35.1889616530209, 34.2004259883859, 
33.2295174626826, 32.2669278456991, 31.3171387914754, 30.3742375589802, 
29.4555719783757, 27.6243725086786, 23.9784367995753, 27.625, 
27.625, 27.625)), row.names = c(NA, -61L), class = c("data.table", 
"data.frame"))

As can be seen in the image below, I am getting an unexpected result. The expected result is that the line (z column) in the graph below should closely follow the points (y column).

Smoothing using lowess

Here is my code -

library(data.table)
library(ggplot2)
test_dt[, z := lowess(x = x, y = y, f = 0.1)$y]
ggplot(test_dt)   geom_point(aes(x, y))   geom_line(aes(x, z))

Q1. Can someone suggest why lowess is not smoothing properly?

Q2. Since lowess is not working as expected, is there any other function in R that would be more efficient in smoothing the y column without producing a spike (as lowess did on the boundary points)?

CodePudding user response:

You could use loess instead:

test_dt[, z := predict(loess(y ~ x, data = test_dt))]

ggplot(test_dt)   geom_point(aes(x, y))   geom_line(aes(x, z))

Note though, that if all you want to do is plot the line, this is exactly the method that geom_smooth uses, so without even creating a z column, you could do:

ggplot(test_dt, aes(x, y))   geom_point()   geom_smooth()
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Created on 2021-11-07 by the reprex package (v2.0.0)

CodePudding user response:

The problem got solved by keeping the number of iterations to zero.

test_dt[, z := lowess(x = x, y = y, f = 0.1, iter=0)$y]
  • Related