Home > Software design >  Arranging columns in data frame with subtraction percentage difference from all previous rows
Arranging columns in data frame with subtraction percentage difference from all previous rows

Time:03-02

Here is an example of what i need in R. I have this kind of data

df <-
Scenario    Group1  Group2
S1             8    9
S2             7    8
S3             6    7
S4             4    5
S5             3    4

From this above data i want to obtain following data

df_converted <-
Scenario1   Scenario2   Group1  Group2
S1              S2        -1    -1
S1              S3        -2    -2
S1              S4        -4    -4
S1              S5        -5    -5
S2              S3        -1    -1
S2              S4        -3    -3
S2              S5        -4    -4
S3              S4        -2    -2
S3              S5        -3    -3
S4              S5        -1    -1

Each row in new data is obtained by subtracting previous or lagging row. And this subtraction is obtained for all combination of scenarios in first columns. If the solution is in dplyr then it will be extremely helpful thanks. The above example is for subtraction i need another converted data that give percentage difference between rows

CodePudding user response:

You could use a full_join() here, but there is another option (below), which is faster, and takes advantage of combn()

full_join(df,df, by=character()) %>% 
  filter(Scenario.x<Scenario.y) %>% 
  mutate(
    Group1 = Group1.y-Group1.x,
    Group2 = Group2.y-Group2.x,
    Perc1 = (Group1.y-Group1.x)/Group1.y,
    Perc2 = (Group2.y-Group2.x)/Group2.y
  ) %>% 
  select(Scenario1 = Scenario.x,
         Sceanrio2 = Scenario.y,
         Group1, Group2, Perc1,Perc2)

Output:

   Scenario1 Sceanrio2 Group1 Group2  Perc1  Perc2
   <chr>     <chr>      <dbl>  <dbl>  <dbl>  <dbl>
 1 S1        S2            -1     -1 -0.143 -0.125
 2 S1        S3            -2     -2 -0.333 -0.286
 3 S1        S4            -4     -4 -1     -0.8  
 4 S1        S5            -5     -5 -1.67  -1.25 
 5 S2        S3            -1     -1 -0.167 -0.143
 6 S2        S4            -3     -3 -0.75  -0.6  
 7 S2        S5            -4     -4 -1.33  -1    
 8 S3        S4            -2     -2 -0.5   -0.4  
 9 S3        S5            -3     -3 -1     -0.75 
10 S4        S5            -1     -1 -0.333 -0.25 

Updated, additional faster option:

# Make sure Scenario is sorted
df <- df %>% arrange(Scenario)

# Create a function the gets difference and percent, given all combn
f <- function(df) {
  d= df[2,] - df[1,]
  p = d/df[2,]
  tibble(d,p)
}

# Column bind the results
do.call(cbind, list(
  as_tibble(t(combn(df$Scenario,2)),.name_repair = ~c("Scenario1","Scenario2")),
  df[,2:3] %>% map(~f(combn(.x,2)))
))

Output:

   Scenario1 Scenario2 Group1.d   Group1.p Group2.d   Group2.p
1         S1        S2       -1 -0.1428571       -1 -0.1250000
2         S1        S3       -2 -0.3333333       -2 -0.2857143
3         S1        S4       -4 -1.0000000       -4 -0.8000000
4         S1        S5       -5 -1.6666667       -5 -1.2500000
5         S2        S3       -1 -0.1666667       -1 -0.1428571
6         S2        S4       -3 -0.7500000       -3 -0.6000000
7         S2        S5       -4 -1.3333333       -4 -1.0000000
8         S3        S4       -2 -0.5000000       -2 -0.4000000
9         S3        S5       -3 -1.0000000       -3 -0.7500000
10        S4        S5       -1 -0.3333333       -1 -0.2500000
  • Related