Home > Blockchain >  Linear regression model incorrectly calculated in R
Linear regression model incorrectly calculated in R

Time:01-09

I have 3 different datasets that I have been plotting this way:

Aucune description

Each dataset was imported from a file to a data frame (respectively called vues, likes and commentaires), and contains the date and the corresponding data (either views, likes or comments) for each date. Now, I'd like to plot both linear models onto my graph (likes ~ views and comments ~ views).

Starting with the red one, I entered the following code:

abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=c(likes,vues)),col="red")

And this is what RStudio plots:

Aucune description

Now I don't understand if the problem comes from the dataset or somewhere else, but if I remove the data parameter, or just choose one of the two datasets, it still does the exact same thing, i.e. the following:

abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=likes),col="red")
abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=vues),col="red")
abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30),col="red")

Here is my data:

> vues$X2022.12.30
  [1]   15900    8245    4531  546800    7149   10600    7774   45600  157100
 [10]  348300   15000    7363   24000    6073    6469    5848   13100  185600
 [19]   18700    7622  483800    6373   12000    7839   17100   10800    9846
 [28]    5671   10100    8330    9031  183000   17600    5153  117700   39600
 [37]   10300   27900   11200   29500  387800   15000    8968  465800   72500
 [46]    9501    5816    9761    5814   16200  269700    8905   16300   14700
 [55]  149600    7547  422600   40700   71100   18900  942000   12100   13400
 [64]  551900   16500   12000    8648  131900   10700   18400  183700   13500
 [73]   21500 1203000   14300   14700  108400    5233  388800  368400 1411000
 [82]  286400   17900  261500 1049000   13500   11200   74300 1312000    6044
 [91]   22200    9467    5975  143200    4552  502700    3971    9755   32000
[100]   46800    8844   31600    3671   60700    8249   20100   14500    3475
[109]    5745    2420  193700    2305   13500   90200    5746    5520   29200
[118]    7803    2502    4559    2120    3233  242100    5616    1371    1109
[127]    2123    2097    4019    1444    1515    2350   34600    2642  148000
[136]    2139  541400   13700   52600  421700    9876    3671   33600    6388
[145]   12300    3014   50200    2033   45900    5878    2221    1479

> likes$X2022.12.30
  [1]   1572    935    229  39000    471    944    472   2149  15400  42000   1346
 [12]    517   1977    488    569    462   1940  17200   2121    588  84800    587
 [23]    987    618   1229    862    947    278   1048    628    795  19200   1529
 [34]    319   9050   3119    868   2840    780   1912  40100   1130    759  47800
 [45]   4197    815    470    786    502   1068  33200    698   1145   1442  11200
 [56]    534  41600   3740   5119   2376  91700    904    983  20800    812    869
 [67]    571   6653    807   1356   7332   1005   1597 104700   1171    982  14300
 [78]    367  14900  29800 103500  11900   1073  22700  67700    872    894   3673
 [89] 116800    251   2229    593    392  20400    267  29200    449    569   1933
[100]   2260   1031   3035    311   6370   1014    812    956    241    641    116
[111]   6543    113    503   5505    450    410   2067    494     76    350    155
[122]    122  11400    350     51     42    109     96    200     62     53     98
[133]   1207    153  15500    101  56900    718   4498  23600    619    248   1803
[144]    437    983    234   4188    147   2623    591    176    138

And here is the code I used for plotting the graph if that is relevant:

plot.new()
par(mar=c(4,4,4,4))
par(new=TRUE)
par(bg="#FFECDE")
rect(par("usr")[1], par("usr")[3],
     par("usr")[2], par("usr")[4],
     col = c("#E1DEFF"))
par(new=TRUE)
plot(vues$X2022.12.30,likes$X2022.12.30,col="red",axes=FALSE,xlab="",ylab="",
     main="Nombre de j'aime et de commentaires en fonction du nombre de vues",
     pch=-0x2022,bg="red")
axis(2,ylim=c(0,120000),col="red", col.axis="red",at=seq(0, 120000, by=20000)) 
mtext("Nombre de j'aime",side=2,line=2.5,col="red")
box()
par(new=TRUE)
plot(vues$X2022.12.30,commentaires$X2022.12.30,col="blue",axes=FALSE,xlab="",
     ylab="",ylim=c(0,1500),pch=-0x2022,bg="blue")
axis(4,col="blue",col.axis="blue",at=seq(0, 1500, by=250)) 
mtext("Nombre de commentaires",side=4,line=2.5,col="blue")
axis(1,xlim=c(0,1500000),ylim=c(0,145000),col="black",col.axis="black",
     at=seq(0, 1400000, by=100000)) 
mtext("Nombre de vues",side=1,line=2.5,col="black")
legend(x="topleft",legend=c("J'aime","Commentaires"),
       text.col=c("black","black"),pch=c(-0x2022,-0x2022),col=c("red","blue"),
       bg=c("#C9FFF1"),inset=0.02)

CodePudding user response:

Using par(new=TRUE) and overplotting the commentaries data changes the y-axis scale; abline() is still assuming the old scale is in effect.

The simple solution would be to use abline() to add the regression line before you add the commentaries data.

Example:

dd <- data.frame(vues=
c(15900,8245,4531,546800,7149,10600,7774,45600,157100,
348300,15000,7363,24000,6073,6469,5848,13100,185600,
18700,7622,483800,6373,12000,7839,17100,10800,9846,
5671,10100,8330,9031,183000,17600,5153,117700,39600,
10300,27900,11200,29500,387800,15000,8968,465800,72500,
9501,5816,9761,5814,16200,269700,8905,16300,14700,
149600,7547,422600,40700,71100,18900,942000,12100,13400,
551900,16500,12000,8648,131900,10700,18400,183700,13500,
21500,1203000,14300,14700,108400,5233,388800,368400,1411000,
286400,17900,261500,1049000,13500,11200,74300,1312000,6044,
22200,9467,5975,143200,4552,502700,3971,9755,32000,
46800,8844,31600,3671,60700,8249,20100,14500,3475,
5745,2420,193700,2305,13500,90200,5746,5520,29200,
7803,2502,4559,2120,3233,242100,5616,1371,1109,
2123,2097,4019,1444,1515,2350,34600,2642,148000,
2139,541400,13700,52600,421700,9876,3671,33600,6388,
12300,3014,50200,2033,45900,5878,2221,1479),
likes = c(1572,935,229,39000,471,944,472,2149,15400,42000,1346,
517,1977,488,569,462,1940,17200,2121,588,84800,587,
987,618,1229,862,947,278,1048,628,795,19200,1529,
319,9050,3119,868,2840,780,1912,40100,1130,759,47800,
4197,815,470,786,502,1068,33200,698,1145,1442,11200,
534,41600,3740,5119,2376,91700,904,983,20800,812,869,
571,6653,807,1356,7332,1005,1597,104700,1171,982,14300,
367,14900,29800,103500,11900,1073,22700,67700,872,894,3673,
116800,251,2229,593,392,20400,267,29200,449,569,1933,
2260,1031,3035,311,6370,1014,812,956,241,641,116,
6543,113,503,5505,450,410,2067,494,76,350,155,
122,11400,350,51,42,109,96,200,62,53,98,
1207,153,15500,101,56900,718,4498,23600,619,248,1803,
437,983,234,4188,147,2623,591,176,138))
set.seed(101)
dd$other <- runif(nrow(dd), min=0, max = 1500)

plot(likes~vues, data =dd)
abline(lm(likes~vues, data =dd))
par(new=TRUE)
plot(other~vues, data = dd, axes=FALSE, col = 2)
abline(lm(likes~vues, data =dd), col =4, lwd =2)

  • Related