Home > Enterprise >  Remove row from tibble containing empty string in R?
Remove row from tibble containing empty string in R?

Time:11-07

Pretty basic question that has me confused. How do I remove rows from a tibble that contain empty character strings?

For example:

library(dplyr)

df <- data.frame(
  data = c(
    "1 1 2 2 3, 3 4 5 6 7",
    "1 1 3 3, 2 3 4 5",
    ", ",
    "1 1, 2 3"
  ),
  num = c(2, 3, 1, 4)
)


dfTest <- df %>%
  as_tibble() %>%
  setNames(c("data", "num")) %>%
  separate(data, c("col1", "col2"), ", ")

> dfTest
# A tibble: 4 × 3
  col1        col2          num
  <chr>       <chr>       <dbl>
1 "1 1 2 2 3" "3 4 5 6 7"     2
2 "1 1 3 3"   "2 3 4 5"       3
3 ""          ""              1
4 "1 1"       "2 3"           4

Taking a look at dfTest we can see that row 3 contains empty character strings. I'm looking for a way to remove these types of rows from a tibble?

CodePudding user response:

Try using base R,

dfTest[!dfTest$col1 == "", ]
  col1      col2        num
  <chr>     <chr>     <dbl>
1 1 1 2 2 3 3 4 5 6 7     2
2 1 1 3 3   2 3 4 5       3
3 1 1       2 3           4

CodePudding user response:

A tidyverse solution:

dfTest <- dfTest %>% 
  filter(. != "")

# A tibble: 3 x 3
  col1      col2        num
  <chr>     <chr>     <dbl>
1 1 1 2 2 3 3 4 5 6 7     2
2 1 1 3 3   2 3 4 5       3
3 1 1       2 3           4

CodePudding user response:

From your question it is not quite clear if you want the row removed only when both strings are empty, or when either of them is.

I changed a bit your example to point at the difference, and detail an approach that allows to select the columns to be checked as well as to clarify if you want all of them or any of them to be empty for them to be removed.

library(dplyr)
library(tidyr)

df <- data.frame(
  data = c(
    "1 1 2 2 3, 3 4 5 6 7",
    "1 1 3 3, ",
    ", ",
    "1 1, 2 3"
  ),
  num = c(2, 3, 1, 4)
)


dfTest <- df %>%
  as_tibble() %>%
  setNames(c("data", "num")) %>%
  separate(data, c("col1", "col2"), ", ") %>% 
  rowwise() %>% 
  filter(!Reduce(f = `|`, x = c_across(col1:col2)=="")) %>%  
  ungroup()
dfTest
#> # A tibble: 2 × 3
#>   col1      col2        num
#>   <chr>     <chr>     <dbl>
#> 1 1 1 2 2 3 3 4 5 6 7     2
#> 2 1 1       2 3           4

Created on 2021-11-07 by the reprex package (v2.0.1)

  • Related