I have well over 100,000 GPS locations of 35 animals. I have removed the 'NA' and '0' GPS latitude-longitude locations but noticed that there was one latitude and longitude location that was incorrect and that needs to be removed (in this subset of data, the 4th line that has -78.6917357 17.5506138 as LAT and LON). It is likely that there are other incorrect GPS locations and wondered if there is an easy way to identify outliers and remove them.
My sample data looks like this:
COLLAR NAME Animal_ID SEX DATE TIME Year Month Day Hour LATITUDE LONGITUDE HEIGHT
26 Keith CM8 M 2009-05-28 2:00:00 2009 5 28 2 49.7518424 -123.6099396 705.87
26 Keith CM8 M 2009-06-09 7:00:00 2009 6 9 7 49.7518495 -123.4860212 191.61
26 Keith CM8 M 2009-05-31 18:00:002009 5 31 18 49.7518576 -123.5373316 410.96
26 Jack CM6 M 2009-06-01 22:00:002009 6 1 22 -78.6917357 17.5506138 490.23
26 Keith CM8 M 2009-05-28 2:00:00 2009 5 28 2 49.7518424 -123.6099396 705.87
26 Keith CM8 M 2009-06-09 7:00:00 2009 6 9 7 49.7518495 -123.4860212 191.61
26 Keith CM8 M 2009-05-31 18:00:002009 5 31 18 49.7518576 -123.5373316 410.96
27 Keith CM8 M 2009-05-28 3:00:00 2009 5 28 3 49.7518775 -123.6099242 713.05
27 Keith CM8 M 2009-06-09 10:00:002009 6 9 10 49.7519163 -123.486203 108.02
The code I used is this which works to remove the 0 and NA:
library(dplyr)
data <- data_all %>%
filter(!is.na(LATITUDE), LATITUDE !=0,!is.na(LONGITUDE), LONGITUDE !=0)
Now, I would like to further remove row 4 here (and any other invalid or incorrect spatial points) using the following line of code but that does not work:
data <- filter(LATITUDE !=-78.69174, LONGITUDE !=17.55061)
I cannot see a reduction in the number of rows after running this code. Please note that I do not have row numbers so cannot specifically remove row 4 and, ideally, I want to remove all those rows that have odd values in one line of code (or as a pipe function) that does work. Your help would be most appreciated. Thanks!
CodePudding user response:
The stored values are likely different than what is displayed. Use dplyr::near for approximate matches to coordinates you know are incorrect. If I were you, id use mutate first to flag incorrect coordinates via a new boolean column, then filter on that boolean column
CodePudding user response:
Here's an approach that limits to values of latitude and longitude within an expected range:
data <- data_all %>%
filter(!is.na(LATITUDE), between(LATITUDE, 49, 51),
!is.na(LONGITUDE), between(LONGITUDE, -125, -122))
or equivalently
data <- data_all %>%
filter(!is.na(LATITUDE), LATITUDE >= 49, LATITUDE <= 51,
!is.na(LONGITUDE), LONGITUDE >= -125, LONGITUDE <= -122)