Home > Software engineering >  Can't remove row with empty value with the common methods nor whitespace of dataframe
Can't remove row with empty value with the common methods nor whitespace of dataframe

Time:08-17

I have a dataframe but it is not possible to remove row 8:

PKV_clean
   ID            x
1   1      scharfkantig
2   1                 t
4   1    seit paartagen
8   1                  
10  1          knirscht
11  1  schiene empohlen
12  1           meldet 

neither:

PKV_clean <- PKV_clean[!apply(is.na(PKV_clean) | PKV_clean == " ", 1, all),]

PKV_clean <- PKV_clean[!(is.na(PKV_clean$x) | PKV_clean$x ==""), ]

to remove NAs and also empty space.

nor can I remove the single whitespace in row 12, when I build a corpus.

PKV_clean <-  tm_map(PKV_clean, stripWhitespace)

This functions work, there is no error-message, but it doesn't remove anything. Could there be any hidden strings it doesn't show to the viewer?

Edit1:

dput(PKV_clean)
structure(list(ID = c("1", "1", "1", "1", "1", "1", "1"), x = c("    scharfkantig", 
"t", " seit paartagen", " ", " knirscht", " schiene empohlen", 
"  meldet ")), row.names = c(1L, 2L, 4L, 8L, 10L, 11L, 12L), class = "data.frame")

CodePudding user response:

You have a lot of unnecessary space in your vector x. Row 8 is actually " ", not "". First, you can trim whitespace, and then filter out empty strings:

library(dplyr)
library(stringr)
df %>% 
  mutate(x = str_trim(x)) %>% 
  filter(x != "")

  ID                x
1  1     scharfkantig
2  1                t
3  1   seit paartagen
4  1         knirscht
5  1 schiene empohlen
6  1           meldet

More directly, you can just do this (if you don't care about the whitespace in the other parts of the column):

df[df$x != " ", ]

CodePudding user response:

We may use base R as well

subset(PKV_clean, nzchar(trimws(x)))
   ID                 x
1   1      scharfkantig
2   1                 t
4   1    seit paartagen
10  1          knirscht
11  1  schiene empohlen
12  1           meldet 
  •  Tags:  
  • r
  • Related