I have imported in r a .tsv file with fread function from data.table package
dt <- fread( "full_data.tsv", nrows = 1000000)
the dataset have 37 columns, something like this:
| ID | DATA | lan |geo_coord |
|:----|:---------:| -----:|----------:|
|10002| 2020-02-01| eng |[10.2,32.5]|
|10003| 2020-02-01| eng |[12.2,42.5]|
|10004| 2020-02-01| eng |[14.4,22.6]|
|10005| | eng |[32.6,23.5]|
|10004| 2020-02-01| eng |[16.2,21.2]|
|10006| | eng |[16.7,20.2]|
|10007| 2020-02-01| eng | |
|10008| 2020-02-01| eng | |
|10009| 2020-02-01| eng | |
I would filter only geo-coord column (character) in order to remove empty cells obtaining a result like this:
| ID | DATA | lan |geo_coord |
|:----|:---------:| -----:|----------:|
|10002| 2020-02-01| eng |[10.2,32.5]|
|10003| 2020-02-01| eng |[12.2,42.5]|
|10004| 2020-02-01| eng |[14.4,22.6]|
|10005| | eng |[32.6,23.5]|
|10004| 2020-02-01| eng |[16.2,21.2]|
|10006| | eng |[16.7,20.2]|
I tried with filter from dplyr without result.
Thanks in advice for any suggestion or help!
CodePudding user response:
Did you try
library(tidyverse)
dt %>%
filter(geo_coord != "" & !is.na(geo_coord))
It seems the missing are not coded as such, but are empty strings.
CodePudding user response:
We may use
library(dplyr)
dt %>%
filter(complete.cases(na_if(geo_coord, "")))