I hope someone can help me with this one!
I have the following dataset and want to create a new column where the values of aver1, aver2 and aver3 are represented.
I tried it with rowSums but this did not work for me because when i put na.rm = TRUE also those rows who have only empty columns have 0 as their sum and I can not differentiate these from the ones that actually do have 0 as their value.
What I have:
count | aver1. | aver2. | aver3. |
---|---|---|---|
X | NA | 1 | NA |
Y | 1 | NA | NA |
X | NA | NA | 0 |
What I want:
count | aver1. | aver2. | aver3. | aver_all |
---|---|---|---|---|
X | NA | 1 | NA | 1 |
Y | 1 | NA | NA | 1 |
X | NA | NA | 0 | 0 |
the dput output:
structure(list(count = c(0,
0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0,
0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1,
1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0,
0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1,
1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0,
1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0,
1), start = c(NA, NA, NA, 5, NA, NA, NA, NA, 1, NA, NA, NA, NA,
1, 1, 1, NA, NA, 9, NA, NA, NA, 3, 4, NA, 11, 1, NA, NA, 1, NA,
NA, NA, 6, NA, NA, 5, NA, 5, NA, NA, NA, NA, NA, 1, NA, 3, NA,
NA, 3, 1, NA, 13, NA, 0, NA, NA, NA, NA, 1, NA, NA, NA, 12, 1,
NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, NA, 1, NA, NA, NA, NA,
2, NA, 2, NA, NA, NA, 2, NA, NA, 1, NA, 3, NA, 3, NA, NA, NA,
NA, 10, NA, 1, NA, 0, 0, 1, 1, NA, NA, NA, NA, NA, 1, NA, 2,
7, NA, 1, NA, NA, 3, NA, 2, 6, NA, 3, NA, 1, 8, 1, NA, 1, NA,
NA, 0, NA, 0, 1, NA, NA, NA, NA, 3, NA, 0, NA, NA, NA, 1, NA,
NA, 0, NA, NA, NA, NA, NA, 2, NA, NA, 0, NA, NA, NA, NA, NA,
NA, 1, NA, 4), aver1 = c(NA, NA, NA, 0.5, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 0.166666666666667, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 0.133333333333333, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, 0.266666666666667, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 0.566666666666667, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 0.266666666666667, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA), aver2 = c(NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 0.333333333333333, 0.416666666666667, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.25, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.916666666666667,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.472222222222222,
NA, NA, NA, NA, NA, NA, 0.388888888888889, NA, NA, NA, 0.0833333333333333,
NA, NA, NA, NA, 0.0555555555555556, NA, 0.111111111111111, NA,
NA, NA, NA, NA, NA, NA, NA, 0.305555555555556, NA, 0.861111111111111,
NA, NA, NA, NA, NA, NA, NA, NA, 0.194444444444444, NA, NA, NA,
NA, NA, 0.611111111111111, NA, NA, NA, NA, 0, NA, 1, NA, 0.694444444444444,
NA, NA, NA, NA, 0.5, NA, 1, NA, NA, NA, NA, NA, 0.0277777777777778,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.138888888888889,
NA, NA, 0.583333333333333, NA, NA, NA, NA, NA, NA, 0.194444444444444,
NA, NA), aver3 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, 0.514285714285714,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 0.0285714285714286, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 1, 0.214285714285714, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.0142857142857143, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 0.614285714285714, NA, NA, NA, NA, 0.371428571428571,
NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA,
NA, NA, NA, 0.9, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.0571428571428571,
NA, NA, 0.128571428571429, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 0.1)), row.names = c(NA, -170L
), class = c("tbl_df", "tbl", "data.frame"))
CodePudding user response:
This is an example that allows you to sum your selected variables from your data-frame (let's call this data-frame: 'df').
df$aver_all <- apply(df[, c("aver1", "aver2", "aver3")], 1, function(x) sum(x, na.rm=TRUE))
It will add 0s to rows where there are only NAs for aver1-2-3.
The next code will replace by NAs, the rows with full NAs.
df$aver_all <- apply(df[, c("aver1", "aver2", "aver3")], 1, function(x) ifelse(FALSE %in% is.na(x), sum(x, na.rm=TRUE), NA))
CodePudding user response:
Given that you have said that you also have rows where all column values are NAs, I will create an additional row in your dataset that fulfills this condition:
dataset <- tibble(count = c("X", "Y", "X", "Z"), aver1. = c(NA, 1, NA, NA),
aver2. = c(1, NA, NA, NA), aver3. = c(NA, NA, 0, NA))
You can use the conditional case_when (https://dplyr.tidyverse.org/reference/case_when.html), which will allow you to set values depending on the conditions you choose for each row. In this case, you could use:
dataset$aver_all <- case_when(is.na(aver1.) & is.na(aver2.) & is.na(aver3.) ~ NA_real_,
aver1. | aver2. | aver3. ~ 1,
TRUE ~ 0)
Here the first condition sets rows where all values are NA to NA, the second sets a 1 if at least one of the three values of a row is a 1; and finally if none of these conditions is satisfied, a 0 is set.