Home > OS >  How to filter a specific column character in R after importing data with fread function
How to filter a specific column character in R after importing data with fread function

Time:10-24

I have imported in r a .tsv file with fread function from data.table package

dt <- fread( "full_data.tsv", nrows = 1000000)

the dataset have 37 columns, something like this:

| ID  |    DATA   |  lan  |geo_coord  | 
|:----|:---------:| -----:|----------:|
|10002| 2020-02-01| eng   |[10.2,32.5]|
|10003| 2020-02-01| eng   |[12.2,42.5]|
|10004| 2020-02-01| eng   |[14.4,22.6]|
|10005|           | eng   |[32.6,23.5]|
|10004| 2020-02-01| eng   |[16.2,21.2]|
|10006|           | eng   |[16.7,20.2]|
|10007| 2020-02-01| eng   |           |
|10008| 2020-02-01| eng   |           |
|10009| 2020-02-01| eng   |           |

I would filter only geo-coord column (character) in order to remove empty cells obtaining a result like this:

| ID  |    DATA   |  lan  |geo_coord  | 
|:----|:---------:| -----:|----------:|
|10002| 2020-02-01| eng   |[10.2,32.5]|
|10003| 2020-02-01| eng   |[12.2,42.5]|
|10004| 2020-02-01| eng   |[14.4,22.6]|
|10005|           | eng   |[32.6,23.5]|
|10004| 2020-02-01| eng   |[16.2,21.2]|
|10006|           | eng   |[16.7,20.2]|

I tried with filter from dplyr without result.

Thanks in advice for any suggestion or help!

CodePudding user response:

Did you try

library(tidyverse)
dt %>%
  filter(geo_coord != "" & !is.na(geo_coord))

It seems the missing are not coded as such, but are empty strings.

CodePudding user response:

We may use

library(dplyr)
dt %>%
    filter(complete.cases(na_if(geo_coord, "")))
  • Related