I have the following dataframe with an empty column:
my_df <- structure(list(X1 = c("", "", "Assets", "Cash",
"Liability"), X2 = c("", "", "", "",
""), X4 = c("", "Uncredited", "21,765", "11,534",
"10,738"), X7 = c("", "Uncredited", "24,627", "14,412",
"12,845")), row.names = c(NA, 5L), class = "data.frame")
my_df
X1 X2 X4 X7
<chr> <chr> <chr> <chr>
1
2 Uncredited Uncredited
3 Assets 21,765 24,627
4 Cash 11,534 14,412
5 Liability 10,738 12,845
The str
method shows the column 2 is completely empty. Here it's column 2, but it can be column 3 or column 4 in different dataframes.
str(my_df)
'data.frame': 5 obs. of 4 variables:
$ X1: chr "" "" "Assets" ...
$ X2: chr "" "" "" "" ...
$ X4: chr "" "Uncredited" "21,765" ...
$ X7: chr "" "Uncredited" "24,627" ...
However, if we try to count the empty values in each column, we see 0 empty row for the column 2.
colSums(is.na(my_df) | my_df == "")
X1 X2 X4 X7
0 0 1 1
Maybe that's the reason, whatever methods I have tried below to remove empty column didn't work.
# Using `discard`
my_df %>% discard(~all(is.na(.) | . ==""))
# Using `colSums`
empty_columns <- colSums(is.na(my_df) | my_df == "") == nrow(my_df)
my_df[, !empty_columns]
# Using `sapply`
empty_columns <- sapply(my_df, function(x) all(is.na(x) | x == ""))
my_df[, !empty_columns]
What would be the problem with the column 2 -- column number differs sometimes -- and how it can be removed? Any suggestions would be appreciated. Thanks!
CodePudding user response:
The reason is that it is not a blank
charToRaw(my_df$X2[1])
[1] e2 80 8b
charToRaw("")
raw(0)
We may replace the elements to blank and then do the select
library(dplyr)
library(textclean)
my_df %>%
mutate(across(where(is.character),
~ replace_non_ascii(.x, NA_character_))) %>%
select(where(~any(complete.cases(.x))))
-output
X1 X4 X7
1
2 Uncredited Uncredited
3 Assets 21,765 24,627
4 Cash 11,534 14,412
5 Liability 10,738 12,845