Home > OS >  Creating new column with values from multiple other columns
Creating new column with values from multiple other columns

Time:02-15

I hope someone can help me with this one!

I have the following dataset and want to create a new column where the values of aver1, aver2 and aver3 are represented.

I tried it with rowSums but this did not work for me because when i put na.rm = TRUE also those rows who have only empty columns have 0 as their sum and I can not differentiate these from the ones that actually do have 0 as their value.

What I have:

count aver1. aver2. aver3.
X NA 1 NA
Y 1 NA NA
X NA NA 0

What I want:

count aver1. aver2. aver3. aver_all
X NA 1 NA 1
Y 1 NA NA 1
X NA NA 0 0

the dput output:

structure(list(count = c(0, 
0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 
1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 
0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 
1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 
0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 
1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 
1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 
1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 
1), start = c(NA, NA, NA, 5, NA, NA, NA, NA, 1, NA, NA, NA, NA, 
1, 1, 1, NA, NA, 9, NA, NA, NA, 3, 4, NA, 11, 1, NA, NA, 1, NA, 
NA, NA, 6, NA, NA, 5, NA, 5, NA, NA, NA, NA, NA, 1, NA, 3, NA, 
NA, 3, 1, NA, 13, NA, 0, NA, NA, NA, NA, 1, NA, NA, NA, 12, 1, 
NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, NA, 1, NA, NA, NA, NA, 
2, NA, 2, NA, NA, NA, 2, NA, NA, 1, NA, 3, NA, 3, NA, NA, NA, 
NA, 10, NA, 1, NA, 0, 0, 1, 1, NA, NA, NA, NA, NA, 1, NA, 2, 
7, NA, 1, NA, NA, 3, NA, 2, 6, NA, 3, NA, 1, 8, 1, NA, 1, NA, 
NA, 0, NA, 0, 1, NA, NA, NA, NA, 3, NA, 0, NA, NA, NA, 1, NA, 
NA, 0, NA, NA, NA, NA, NA, 2, NA, NA, 0, NA, NA, NA, NA, NA, 
NA, 1, NA, 4), aver1 = c(NA, NA, NA, 0.5, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, 0.166666666666667, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, 0.133333333333333, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, 0.266666666666667, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, 0.566666666666667, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, 0.266666666666667, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA), aver2 = c(NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, 0.333333333333333, 0.416666666666667, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.25, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.916666666666667, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.472222222222222, 
NA, NA, NA, NA, NA, NA, 0.388888888888889, NA, NA, NA, 0.0833333333333333, 
NA, NA, NA, NA, 0.0555555555555556, NA, 0.111111111111111, NA, 
NA, NA, NA, NA, NA, NA, NA, 0.305555555555556, NA, 0.861111111111111, 
NA, NA, NA, NA, NA, NA, NA, NA, 0.194444444444444, NA, NA, NA, 
NA, NA, 0.611111111111111, NA, NA, NA, NA, 0, NA, 1, NA, 0.694444444444444, 
NA, NA, NA, NA, 0.5, NA, 1, NA, NA, NA, NA, NA, 0.0277777777777778, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.138888888888889, 
NA, NA, 0.583333333333333, NA, NA, NA, NA, NA, NA, 0.194444444444444, 
NA, NA), aver3 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, 0.514285714285714, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 0.0285714285714286, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, 1, 0.214285714285714, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.0142857142857143, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, 0.614285714285714, NA, NA, NA, NA, 0.371428571428571, 
NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, 
NA, NA, NA, 0.9, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.0571428571428571, 
NA, NA, 0.128571428571429, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, 0.1)), row.names = c(NA, -170L
), class = c("tbl_df", "tbl", "data.frame"))

CodePudding user response:

This is an example that allows you to sum your selected variables from your data-frame (let's call this data-frame: 'df').

df$aver_all <- apply(df[, c("aver1", "aver2", "aver3")], 1, function(x) sum(x, na.rm=TRUE))

It will add 0s to rows where there are only NAs for aver1-2-3.

The next code will replace by NAs, the rows with full NAs.

df$aver_all <- apply(df[, c("aver1", "aver2", "aver3")], 1, function(x) ifelse(FALSE %in% is.na(x), sum(x, na.rm=TRUE), NA))

CodePudding user response:

Given that you have said that you also have rows where all column values are NAs, I will create an additional row in your dataset that fulfills this condition:

dataset <- tibble(count = c("X", "Y", "X", "Z"), aver1. = c(NA, 1, NA, NA),
                  aver2. = c(1, NA, NA, NA), aver3. = c(NA, NA, 0, NA))

You can use the conditional case_when (https://dplyr.tidyverse.org/reference/case_when.html), which will allow you to set values depending on the conditions you choose for each row. In this case, you could use:

dataset$aver_all <- case_when(is.na(aver1.) & is.na(aver2.) & is.na(aver3.) ~ NA_real_,
                              aver1. | aver2. | aver3. ~ 1,
                              TRUE ~ 0)

Here the first condition sets rows where all values are NA to NA, the second sets a 1 if at least one of the three values of a row is a 1; and finally if none of these conditions is satisfied, a 0 is set.

  • Related