Home > Mobile >  Empty Column Showing Zero Empty Values and Not Removable
Empty Column Showing Zero Empty Values and Not Removable

Time:12-28

I have the following dataframe with an empty column:

my_df <- structure(list(X1 = c("​", "​", "Assets", "Cash", 
"Liability"), X2 = c("​", "​", "​", "​", 
"​"), X4 = c("​", "Uncredited", "21,765", "11,534​", 
"10,738"), X7 = c("​", "Uncredited", "24,627", "​14,412", 
"12,845")), row.names = c(NA, 5L), class = "data.frame")

my_df

     X1            X2          X4            X7
    <chr>         <chr>       <chr>        <chr>
1   ​   ​   
2   ​   ​                 Uncredited    Uncredited
3   Assets  ​   ​           ​21,765           24,627 
4   Cash    ​               11,534        14,412 
5   Liability   ​           10,738        12,845 

The str method shows the column 2 is completely empty. Here it's column 2, but it can be column 3 or column 4 in different dataframes.

str(my_df)

'data.frame':   5 obs. of  4 variables:
 $ X1: chr  "​" "​" "Assets" ...
 $ X2: chr  "​" "​" "​" "​" ...
 $ X4: chr  "​" "Uncredited" "21,765" ...
 $ X7: chr  "​" "Uncredited" "24,627" ...

However, if we try to count the empty values in each column, we see 0 empty row for the column 2.

colSums(is.na(my_df) | my_df == "")

X1  X2  X4  X7
0   0   1   1

Maybe that's the reason, whatever methods I have tried below to remove empty column didn't work.

# Using `discard`
my_df %>% discard(~all(is.na(.) | . ==""))

# Using `colSums`
empty_columns <- colSums(is.na(my_df) | my_df == "") == nrow(my_df)
my_df[, !empty_columns]

# Using `sapply`
empty_columns <- sapply(my_df, function(x) all(is.na(x) | x == ""))
my_df[, !empty_columns]

What would be the problem with the column 2 -- column number differs sometimes -- and how it can be removed? Any suggestions would be appreciated. Thanks!

CodePudding user response:

The reason is that it is not a blank

 charToRaw(my_df$X2[1])
[1] e2 80 8b
charToRaw("")
raw(0)

We may replace the elements to blank and then do the select

library(dplyr)
library(textclean)
 my_df %>%
   mutate(across(where(is.character), 
       ~ replace_non_ascii(.x, NA_character_))) %>%
   select(where(~any(complete.cases(.x))))

-output

       X1         X4         X7
1          ​           ​           ​
2          ​ Uncredited Uncredited
3    Assets     21,765     24,627
4      Cash     11,534​     ​14,412
5 Liability     10,738     12,845
  • Related