Home > Mobile >  Removing outliers linear regression
Removing outliers linear regression

Time:10-21

I'm running data through linear regression and discovered outliers. I've tried data=dataframe[-c("country1", "country2"),] but the outliers still appear. Can I get some help here? Thank you

#Remove outliers
fit <- lm(Robbery ~ Unlawful.acts.involving.controlled.drugs.or.precursors, 
          data=NoNACountry[-c("Spain", "Luxembourg"),])
par(mfrow=c(2,2))
plot(fit)

I think I've somehow lost the row names, because I think the Robbery and Unlawful.acts... have become vectors? I used the Country names are row labels and the Robbery and Unlawful.acts... are columns. I have been able through guidance from here, to use drop = FALSE in other code, but I have not been able to incorporate this approach here

Dataframe information is below

structure(list(Intentional.homicide = c(2.03, 0.84, 1.14), Attempted.intentional.homicide = c(3.25, 
1.93, 0.54), Assault = c(5.52, 43.29, 39.54), Kidnapping = c(0.14, 
0.07, 1.03), Sexual.violence = c(5.38, 50.9, 8.64), Robbery = c(3.42, 
29.67, 16.9), Unlawful.acts.involving.controlled.drugs.or.precursors = c(70.26, 
494.05, 78.14), Country.Totals.per.000s = c(90, 620.75, 145.93
)), row.names = c("Albania", "Austria", "Bulgaria"), class = "data.frame")

CodePudding user response:

Since you are using a data.frame with row names, you could use

NoNACountry[!row.names(NoNACountry) %in% c("Spain", "Luxembourg"),]

CodePudding user response:

It's nice to move rownames to a column so they can be transformed by standard data.frame methods.

With dplyr::rownames_to_column():

library(tidyverse)

mtcars %>% 
  rownames_to_column(var = "car_name") %>% 
  head()

           car_name  mpg cyl disp  hp drat    wt  qsec vs am gear carb
1         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
2     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
3        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
4    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
5 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
6           Valiant 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

In base R:

mtcars$car_name <- rownames(mtcars)

Once the rownames are in a column, you can use bracket notation filtering and dplyr::filter(). For instance:

# vector of car names to keep
cars_keep <- c("Volvo 142E", "Maserati Bora")

# base R
mtcars[which(mtcars$car_name %in% cars_keep), ]

# dplyr
filter(mtcars, car_name %in% cars_keep)
  • Related