Home > Net >  Create new columns from the results of on operation from previous columns
Create new columns from the results of on operation from previous columns

Time:06-10

Using the following code:

val_a <- c("2", "4", "6", "8")
divide_a <- c(1, 2, 3, 4)
val_b <- c("62", "42", "56", "28")
divide_b <- c(11, 12, 22, 44)
val_c <- c("652", "142", "546", "298")
divide_c <- c(74, 23, 112, 64)
df <- as.data.frame(cbind(val_a, divide_a, val_b, divide_b, val_c, divide_c))

giving the following data frame:

  val_a divide_a val_b divide_b val_c divide_c
1     2        1    62       11   652       74
2     4        2    42       12   142       23
3     6        3    56       22   546      112
4     8        4    28       44   298       64

I'm trying to divide "val" by "divide" for the three sets of values, giving something like:

  val_a divide_a val_b divide_b val_c divide_c result_ a result_b result_c
1     2        1    62       11   652       74 2       5,6        8,8
2     4        2    42       12   142       23 4       3,8        6,2
3     6        3    56       22   546      112 etc.
4     8        4    28       44   298       64

So far I've been trying to use a for loop:

for(i in 1:ncol(df)) {       # for-loop over columns
  df[ , paste0("result", i)] <- as.numeric((df[ ,i]))/as.numeric((df[ ,i 1]))
}

The problems are that:

  1. It divides every column by the next when it should skip one (e.g. not divide "divide_a" by "val_b")
  2. Doesn't name the columns

I am at loss what to do and would appreciate any help. My actual dataframe has over 1000 columns so I'd rather not do this manually. Thank you.

CodePudding user response:

I'm sure there is a concise (but inscrutable to me) way to do this in base R, but my personal preference for this sort of problem is to use dplyr/tidyr to reshape the data and keep it simple.

The code below will take any number of "val"/"divide" pairs and calculate the ratio.

library(tidyverse)
df %>%
  mutate(row = row_number()) %>%
  pivot_longer(-row, names_to = c("operation", "group"), 
               names_sep = "_", values_transform = as.numeric) %>%
  pivot_wider(names_from = operation, values_from = value) %>%
  mutate(result = val / divide)
  

First, I added a row number variable to help keep track of the data between reshaping steps. Then I pivot_longer to convert the original data columns into three new columns, "operation," "group", and "value" (that's the default name). Then I reshape wider based on "operation", and then it's straightforward to divide each val by its associated divide value.

The result at this point is

# A tibble: 12 × 5
     row group   val divide result
   <int> <chr> <dbl>  <dbl>  <dbl>
 1     1 a         2      1  2    
 2     1 b        62     11  5.64 
 3     1 c       652     74  8.81 
 4     2 a         4      2  2    
 5     2 b        42     12  3.5  
 6     2 c       142     23  6.17 
 7     3 a         6      3  2    
 8     3 b        56     22  2.55 
 9     3 c       546    112  4.88 
10     4 a         8      4  2    
11     4 b        28     44  0.636
12     4 c       298     64  4.66 

but we could reshape it if desired. We might add to the chain above:

... %>% 
select(-val, -divide) %>%
pivot_wider(names_from = group, names_prefix = "result_",
            values_from = result)

to get:

# A tibble: 4 × 4
    row result_a result_b result_c
  <int>    <dbl>    <dbl>    <dbl>
1     1        2    5.64      8.81
2     2        2    3.5       6.17
3     3        2    2.55      4.88
4     4        2    0.636     4.66

CodePudding user response:

for loop way is

key <- str_remove(names(df)[startsWith(names(df), "val_")], "val_") 
for (i in key){
  col1 <- paste0("val_", i)
  col2 <- paste0("divide_", i)
  col3 <- paste0("result_", i)
  df[col3] <- as.numeric(df[,col1]) / as.numeric(df[,col2])
}
df

  val_a divide_a val_b divide_b val_c divide_c result_a  result_b result_c
1     2        1    62       11   652       74        2 5.6363636 8.810811
2     4        2    42       12   142       23        2 3.5000000 6.173913
3     6        3    56       22   546      112        2 2.5454545 4.875000
4     8        4    28       44   298       64        2 0.6363636 4.656250

CodePudding user response:

Try this for loop

for(i in 1:(ncol(df)/2)) {       # for-loop over columns
    j <- 2*i - 1
    if(j < ncol(df))
        df[[paste0("result", i)]] <- as.numeric((df[ ,j]))/as.numeric((df[ ,j 1]))
}

CodePudding user response:

The concise but scrutable base R version without reshaping would be:

vars <- c("a","b","c")
df[paste("result", vars, sep="_")] <- df[paste("val",    vars, sep="_")] / 
                                      df[paste("divide", vars, sep="_")]

df
#  val_a divide_a val_b divide_b val_c divide_c result_a  result_b result_c
#1     2        1    62       11   652       74        2 5.6363636 8.810811
#2     4        2    42       12   142       23        2 3.5000000 6.173913
#3     6        3    56       22   546      112        2 2.5454545 4.875000
#4     8        4    28       44   298       64        2 0.6363636 4.656250

The equivalent reshape-ing option to dplyr possibilities would be:

long <- reshape(df, varying=TRUE, direction="long", sep="_")
long$result <- long$val/long$divide
reshape(long, idvar="id", direction="wide", sep="_")

...which could even be |> piped:

reshape(df, varying=TRUE, direction="long", sep="_") |>
    transform(result = val/divide) |>
    reshape(idvar="id", direction="wide", sep="_")
  • Related