Home > Blockchain >  Subracting rows from dataframe
Subracting rows from dataframe

Time:09-30

Let's say I have the following dataset:

Industry Country Year  AUS  AUS  AUT   AUT ...
 A         AUS    1    0.5  0.2  0.1   0.01 
 B         AUS    2    0.3  0.5  2     0.1
 A         AUT    3    1    1.2  1.3   0.3
 B         AUT    4    0.5  0    0.8   2
 ...       ...   ...   ...  ...  ...  ....
 VA                    11   10    47   55
 tot                   24   23    50   70          

How can I subtract ONLY the last two rows(tot= tot-VA) to get:

Industry Country Year  AUS  AUS  AUT   AUT ...
 A         AUS    1    0.5  0.2  0.1   0.01 
 B         AUS    2    0.3  0.5  2     0.1
 A         AUT    3    1    1.2  1.3   0.3
 B         AUT    4    0.5  0    0.8   2
 ...       ...   ...   ...  ...  ...  ....
 VA                    11   10    47   55
 FI                    13   13    3    15  
 FI/VA                 1.2  1.3  0.06  0.27

Where FI is simply tot-VA

CodePudding user response:

You could try this:

  1. check which columns are numeric

  2. use sapply to calculate the new row FI

  3. bind them together

num <- sapply(df, class) == "numeric"
df_tot<- data.frame(as.list(sapply(df[, num], function(x) x[length(x)]-x[length(x)-1])))
df_tot$Industry <- "FI"

df <- data.table::rbindlist(list(df, df_tot), fill = TRUE)

EDIT:

If you just want to sum up all rows but the last one, then you could try this:

num <- sapply(df, class) == "numeric"
df_tot <- data.frame(as.list(sapply(df[1:(nrow(df)-1), num], sum)))

df_tot$Industry <- "FI"

df <- data.table::rbindlist(list(df, df_tot), fill = TRUE)

CodePudding user response:

Here's a tidyverse approach to the issue of subtracting selected rows across columns:

library(tidyverse)
df %>% 
  # subtract across the relevant columns:
  summarise(across(matches("^AU"), ~(.x[Industry == "tot"] - .x[Industry == "VA"]))) %>%
  # add the 'new' column `Industry`:
  mutate(Industry = "FI") %>%
  # bind result back into `df`:
  bind_rows(df,.)
  Industry  AU1  AU2  AU3
1        A  0.1  0.4  7.0
2        B  0.7  3.0  1.0
3        A  3.0  2.5  0.1
4       VA 11.0 10.0 47.0
5      tot 24.0 23.0 50.0
6       FI 13.0 13.0  3.0

If you no longer need rows #4 and #5, add this to the pipe:

filter(!Industry %in% c("VA", "tot"))

Data:

df <- data.frame(
  Industry = c("A","B","A","VA","tot"),
  AU1 = c(0.1,0.7,3,11,24),
  AU2 = c(0.4,3,2.5,10,23),
  AU3 = c(7, 1, 0.1,47,50)
)
  • Related