Using the following code:
val_a <- c("2", "4", "6", "8")
divide_a <- c(1, 2, 3, 4)
val_b <- c("62", "42", "56", "28")
divide_b <- c(11, 12, 22, 44)
val_c <- c("652", "142", "546", "298")
divide_c <- c(74, 23, 112, 64)
df <- as.data.frame(cbind(val_a, divide_a, val_b, divide_b, val_c, divide_c))
giving the following data frame:
val_a divide_a val_b divide_b val_c divide_c
1 2 1 62 11 652 74
2 4 2 42 12 142 23
3 6 3 56 22 546 112
4 8 4 28 44 298 64
I'm trying to divide "val" by "divide" for the three sets of values, giving something like:
val_a divide_a val_b divide_b val_c divide_c result_ a result_b result_c
1 2 1 62 11 652 74 2 5,6 8,8
2 4 2 42 12 142 23 4 3,8 6,2
3 6 3 56 22 546 112 etc.
4 8 4 28 44 298 64
So far I've been trying to use a for loop:
for(i in 1:ncol(df)) { # for-loop over columns
df[ , paste0("result", i)] <- as.numeric((df[ ,i]))/as.numeric((df[ ,i 1]))
}
The problems are that:
- It divides every column by the next when it should skip one (e.g. not divide "divide_a" by "val_b")
- Doesn't name the columns
I am at loss what to do and would appreciate any help. My actual dataframe has over 1000 columns so I'd rather not do this manually. Thank you.
CodePudding user response:
I'm sure there is a concise (but inscrutable to me) way to do this in base R, but my personal preference for this sort of problem is to use dplyr
/tidyr
to reshape the data and keep it simple.
The code below will take any number of "val"/"divide" pairs and calculate the ratio.
library(tidyverse)
df %>%
mutate(row = row_number()) %>%
pivot_longer(-row, names_to = c("operation", "group"),
names_sep = "_", values_transform = as.numeric) %>%
pivot_wider(names_from = operation, values_from = value) %>%
mutate(result = val / divide)
First, I added a row number variable to help keep track of the data between reshaping steps. Then I pivot_longer
to convert the original data columns into three new columns, "operation," "group", and "value" (that's the default name). Then I reshape wider based on "operation", and then it's straightforward to divide each val
by its associated divide
value.
The result at this point is
# A tibble: 12 × 5
row group val divide result
<int> <chr> <dbl> <dbl> <dbl>
1 1 a 2 1 2
2 1 b 62 11 5.64
3 1 c 652 74 8.81
4 2 a 4 2 2
5 2 b 42 12 3.5
6 2 c 142 23 6.17
7 3 a 6 3 2
8 3 b 56 22 2.55
9 3 c 546 112 4.88
10 4 a 8 4 2
11 4 b 28 44 0.636
12 4 c 298 64 4.66
but we could reshape it if desired. We might add to the chain above:
... %>%
select(-val, -divide) %>%
pivot_wider(names_from = group, names_prefix = "result_",
values_from = result)
to get:
# A tibble: 4 × 4
row result_a result_b result_c
<int> <dbl> <dbl> <dbl>
1 1 2 5.64 8.81
2 2 2 3.5 6.17
3 3 2 2.55 4.88
4 4 2 0.636 4.66
CodePudding user response:
for loop
way is
key <- str_remove(names(df)[startsWith(names(df), "val_")], "val_")
for (i in key){
col1 <- paste0("val_", i)
col2 <- paste0("divide_", i)
col3 <- paste0("result_", i)
df[col3] <- as.numeric(df[,col1]) / as.numeric(df[,col2])
}
df
val_a divide_a val_b divide_b val_c divide_c result_a result_b result_c
1 2 1 62 11 652 74 2 5.6363636 8.810811
2 4 2 42 12 142 23 2 3.5000000 6.173913
3 6 3 56 22 546 112 2 2.5454545 4.875000
4 8 4 28 44 298 64 2 0.6363636 4.656250
CodePudding user response:
Try this for loop
for(i in 1:(ncol(df)/2)) { # for-loop over columns
j <- 2*i - 1
if(j < ncol(df))
df[[paste0("result", i)]] <- as.numeric((df[ ,j]))/as.numeric((df[ ,j 1]))
}
CodePudding user response:
The concise but scrutable base R version without reshaping would be:
vars <- c("a","b","c")
df[paste("result", vars, sep="_")] <- df[paste("val", vars, sep="_")] /
df[paste("divide", vars, sep="_")]
df
# val_a divide_a val_b divide_b val_c divide_c result_a result_b result_c
#1 2 1 62 11 652 74 2 5.6363636 8.810811
#2 4 2 42 12 142 23 2 3.5000000 6.173913
#3 6 3 56 22 546 112 2 2.5454545 4.875000
#4 8 4 28 44 298 64 2 0.6363636 4.656250
The equivalent reshape
-ing option to dplyr possibilities would be:
long <- reshape(df, varying=TRUE, direction="long", sep="_")
long$result <- long$val/long$divide
reshape(long, idvar="id", direction="wide", sep="_")
...which could even be |>
piped:
reshape(df, varying=TRUE, direction="long", sep="_") |>
transform(result = val/divide) |>
reshape(idvar="id", direction="wide", sep="_")