I exported a large database and some data was exported with � since it didn't contain value. The purpose is to calculate the average of each row, however I can't. Also I tried to replace � by NA with
df[ df == "?" ] <- NA
but it didn't work. How can I achieve the average per row with that character? or else how can i replace it with NA?
Thank you.
CodePudding user response:
Try this
df <- c("1", "3", "4" , "�" , 5 , "�")
df[ df == "\UFFFD" ] <- NA
Output
df
#> [1] "1" "3" "4" NA "5" NA
CodePudding user response:
As suggested by Allen Cameron, you can use as.numeric
. I will simply show you how to apply that to the columns (since you said it was a large database).
Example data
# A tibble: 5 × 3
id values values_2
<int> <chr> <chr>
1 1 78 50
2 2 � �
3 3 64 �
4 4 23 20
5 5 F Random
df %>%
mutate(across(2:3, ~ as.numeric(.x)))
# A tibble: 5 × 3
id values values_2
<int> <dbl> <dbl>
1 1 78 50
2 2 NA NA
3 3 64 NA
4 4 23 20
5 5 NA NA
Rowwise mean()
calculations, without the irrelevant id
column
df %>%
mutate(across(2:3, ~ as.numeric(.x))) %>%
rowwise() %>%
mutate(mean = mean(c_across(2:3), na.rm = TRUE))
# A tibble: 5 × 4
# Rowwise:
id values values_2 mean
<int> <dbl> <dbl> <dbl>
1 1 78 50 64
2 2 NA NA NaN
3 3 64 NA 64
4 4 23 20 21.5
5 5 NA NA NaN