Home > Blockchain >  Remove invalid and incorrect spatial points (Latitude and longitude) in R
Remove invalid and incorrect spatial points (Latitude and longitude) in R

Time:11-27

I have well over 100,000 GPS locations of 35 animals. I have removed the 'NA' and '0' GPS latitude-longitude locations but noticed that there was one latitude and longitude location that was incorrect and that needs to be removed (in this subset of data, the 4th line that has -78.6917357 17.5506138 as LAT and LON). It is likely that there are other incorrect GPS locations and wondered if there is an easy way to identify outliers and remove them.

My sample data looks like this:

COLLAR  NAME    Animal_ID   SEX DATE    TIME    Year    Month   Day Hour    LATITUDE    LONGITUDE   HEIGHT
26  Keith   CM8 M   2009-05-28  2:00:00 2009    5   28  2   49.7518424  -123.6099396    705.87
26  Keith   CM8 M   2009-06-09  7:00:00 2009    6   9   7   49.7518495  -123.4860212    191.61
26  Keith   CM8 M   2009-05-31  18:00:002009    5   31  18  49.7518576  -123.5373316    410.96
26  Jack    CM6 M   2009-06-01  22:00:002009    6   1   22  -78.6917357  17.5506138 490.23
26  Keith   CM8 M   2009-05-28  2:00:00 2009    5   28  2   49.7518424  -123.6099396    705.87
26  Keith   CM8 M   2009-06-09  7:00:00 2009    6   9   7   49.7518495  -123.4860212    191.61
26  Keith   CM8 M   2009-05-31  18:00:002009    5   31  18  49.7518576  -123.5373316    410.96
27  Keith   CM8 M   2009-05-28  3:00:00 2009    5   28  3   49.7518775  -123.6099242    713.05
27  Keith   CM8 M   2009-06-09  10:00:002009    6   9   10  49.7519163  -123.486203  108.02

The code I used is this which works to remove the 0 and NA:

    library(dplyr)
    data <- data_all %>%
     filter(!is.na(LATITUDE), LATITUDE !=0,!is.na(LONGITUDE), LONGITUDE !=0)

Now, I would like to further remove row 4 here (and any other invalid or incorrect spatial points) using the following line of code but that does not work:

data <- filter(LATITUDE !=-78.69174, LONGITUDE !=17.55061)

I cannot see a reduction in the number of rows after running this code. Please note that I do not have row numbers so cannot specifically remove row 4 and, ideally, I want to remove all those rows that have odd values in one line of code (or as a pipe function) that does work. Your help would be most appreciated. Thanks!

CodePudding user response:

The stored values are likely different than what is displayed. Use dplyr::near for approximate matches to coordinates you know are incorrect. If I were you, id use mutate first to flag incorrect coordinates via a new boolean column, then filter on that boolean column

CodePudding user response:

Here's an approach that limits to values of latitude and longitude within an expected range:

data <- data_all %>%
     filter(!is.na(LATITUDE),  between(LATITUDE,  49, 51),
            !is.na(LONGITUDE), between(LONGITUDE, -125, -122))

or equivalently

data <- data_all %>%
         filter(!is.na(LATITUDE),  LATITUDE  >= 49,   LATITUDE  <= 51,
                !is.na(LONGITUDE), LONGITUDE >= -125, LONGITUDE <= -122)
  • Related