I have a dataframe but it is not possible to remove row 8:
PKV_clean
ID x
1 1 scharfkantig
2 1 t
4 1 seit paartagen
8 1
10 1 knirscht
11 1 schiene empohlen
12 1 meldet
neither:
PKV_clean <- PKV_clean[!apply(is.na(PKV_clean) | PKV_clean == " ", 1, all),]
PKV_clean <- PKV_clean[!(is.na(PKV_clean$x) | PKV_clean$x ==""), ]
to remove NAs and also empty space.
nor can I remove the single whitespace in row 12, when I build a corpus.
PKV_clean <- tm_map(PKV_clean, stripWhitespace)
This functions work, there is no error-message, but it doesn't remove anything. Could there be any hidden strings it doesn't show to the viewer?
Edit1:
dput(PKV_clean)
structure(list(ID = c("1", "1", "1", "1", "1", "1", "1"), x = c(" scharfkantig",
"t", " seit paartagen", " ", " knirscht", " schiene empohlen",
" meldet ")), row.names = c(1L, 2L, 4L, 8L, 10L, 11L, 12L), class = "data.frame")
CodePudding user response:
You have a lot of unnecessary space in your vector x
. Row 8 is actually " "
, not ""
. First, you can trim whitespace, and then filter out empty strings:
library(dplyr)
library(stringr)
df %>%
mutate(x = str_trim(x)) %>%
filter(x != "")
ID x
1 1 scharfkantig
2 1 t
3 1 seit paartagen
4 1 knirscht
5 1 schiene empohlen
6 1 meldet
More directly, you can just do this (if you don't care about the whitespace in the other parts of the column):
df[df$x != " ", ]
CodePudding user response:
We may use base R
as well
subset(PKV_clean, nzchar(trimws(x)))
ID x
1 1 scharfkantig
2 1 t
4 1 seit paartagen
10 1 knirscht
11 1 schiene empohlen
12 1 meldet