Suppose I have a dataset like this:
a b
"1/2/3" "a/b/c"
"3/5" "e/d/s"
"1" "f"
I want to use separate_rows But I can't because of the second row. How can I find these kinds of rows?
CodePudding user response:
You can find the rows with unequal numbers of '/' symbols by doing:
which(lengths(strsplit(df$a, '/')) != lengths(strsplit(df$b, '/')))
#> [1] 2
Presumably these rows contain data input mistakes, since the number of rows implied by each entry is different.
CodePudding user response:
Or you can directly count the number of "/" in each column, and output the row that does not have equal number of "/".
library(stringr)
with(df, which(str_count(a, "/") != str_count(b, "/")))
[1] 2
Input data
df <- structure(list(a = c("1/2/3", "3/5", "1"), b = c("a/b/c", "e/d/s",
"f")), class = "data.frame", row.names = c(NA, -3L))
CodePudding user response:
Or you can keep all of your rows and call separate_rows()
twice to dodge that error.
# read-in code
tibble::tribble(
~a, ~b,
"1/2/3", "a/b/c",
"3/5", "e/d/s",
"1", "f"
) %>%
as.data.frame() %>%
# end read-in code
separate_rows(b) %>%
separate_rows(a)
CodePudding user response:
Perhaps cSplit
would help
library(splitstackshape)
library(dplyr)
cSplit(df, c("a", "b"), sep = "/", "long") %>%
filter(if_any(c(a, b), complete.cases))
-output
a b
<int> <char>
1: 1 a
2: 2 b
3: 3 c
4: 3 e
5: 5 d
6: NA s
7: 1 f
data
df <- structure(list(a = c("1/2/3", "3/5", "1"), b = c("a/b/c", "e/d/s",
"f")), class = "data.frame", row.names = c(NA, -3L))