Empty Column Showing Zero Empty Values and Not Removable-CodePudding

I have the following dataframe with an empty column:

my_df <- structure(list(X1 = c("", "", "Assets", "Cash", 
"Liability"), X2 = c("", "", "", "", 
""), X4 = c("", "Uncredited", "21,765", "11,534", 
"10,738"), X7 = c("", "Uncredited", "24,627", "14,412", 
"12,845")), row.names = c(NA, 5L), class = "data.frame")

my_df

     X1            X2          X4            X7
    <chr>         <chr>       <chr>        <chr>
1         
2                       Uncredited    Uncredited
3   Assets                21,765           24,627 
4   Cash                   11,534        14,412 
5   Liability              10,738        12,845

The str method shows the column 2 is completely empty. Here it's column 2, but it can be column 3 or column 4 in different dataframes.

str(my_df)

'data.frame':   5 obs. of  4 variables:
 $ X1: chr  "" "" "Assets" ...
 $ X2: chr  "" "" "" "" ...
 $ X4: chr  "" "Uncredited" "21,765" ...
 $ X7: chr  "" "Uncredited" "24,627" ...

However, if we try to count the empty values in each column, we see 0 empty row for the column 2.

colSums(is.na(my_df) | my_df == "")

X1  X2  X4  X7
0   0   1   1

Maybe that's the reason, whatever methods I have tried below to remove empty column didn't work.

# Using `discard`
my_df %>% discard(~all(is.na(.) | . ==""))

# Using `colSums`
empty_columns <- colSums(is.na(my_df) | my_df == "") == nrow(my_df)
my_df[, !empty_columns]

# Using `sapply`
empty_columns <- sapply(my_df, function(x) all(is.na(x) | x == ""))
my_df[, !empty_columns]

What would be the problem with the column 2 -- column number differs sometimes -- and how it can be removed? Any suggestions would be appreciated. Thanks!

CodePudding user response：

The reason is that it is not a blank

 charToRaw(my_df$X2[1])
[1] e2 80 8b
charToRaw("")
raw(0)

We may replace the elements to blank and then do the select

library(dplyr)
library(textclean)
 my_df %>%
   mutate(across(where(is.character), 
       ~ replace_non_ascii(.x, NA_character_))) %>%
   select(where(~any(complete.cases(.x))))

-output

       X1         X4         X7
1                                
2           Uncredited Uncredited
3    Assets     21,765     24,627
4      Cash     11,534     14,412
5 Liability     10,738     12,845