separate_rows with unequal size of strings in R-CodePudding

Suppose I have a dataset like this:

   a       b
"1/2/3" "a/b/c"
 "3/5"  "e/d/s"
  "1"     "f"

I want to use separate_rows But I can't because of the second row. How can I find these kinds of rows?

CodePudding user response：

You can find the rows with unequal numbers of '/' symbols by doing:

which(lengths(strsplit(df$a, '/')) != lengths(strsplit(df$b, '/')))
#> [1] 2

Presumably these rows contain data input mistakes, since the number of rows implied by each entry is different.

CodePudding user response：

Or you can directly count the number of "/" in each column, and output the row that does not have equal number of "/".

library(stringr)

with(df, which(str_count(a, "/") != str_count(b, "/")))
[1] 2

Input data

df <- structure(list(a = c("1/2/3", "3/5", "1"), b = c("a/b/c", "e/d/s", 
"f")), class = "data.frame", row.names = c(NA, -3L))

CodePudding user response：

Or you can keep all of your rows and call separate_rows() twice to dodge that error.

# read-in code
tibble::tribble(
  ~a,       ~b,
  "1/2/3", "a/b/c",
  "3/5",  "e/d/s",
  "1",     "f"
) %>% 
  as.data.frame() %>%
  # end read-in code
  separate_rows(b) %>% 
  separate_rows(a)

CodePudding user response：

Perhaps cSplit would help

library(splitstackshape)
library(dplyr)
cSplit(df, c("a", "b"), sep = "/", "long") %>% 
   filter(if_any(c(a, b), complete.cases))

-output

      a      b
   <int> <char>
1:     1      a
2:     2      b
3:     3      c
4:     3      e
5:     5      d
6:    NA      s
7:     1      f

data

df <- structure(list(a = c("1/2/3", "3/5", "1"), b = c("a/b/c", "e/d/s", 
"f")), class = "data.frame", row.names = c(NA, -3L))