Home > OS >  R: Group by and Apply a function to two columns
R: Group by and Apply a function to two columns

Time:02-19

Hi I'd like to groupby two dataframe columns, and apply a function to aother two dataframe columns. For e.g.,

ticker <- c("A", "A", 'A', "B", "B", "B")
date <- c(1,1,2,1,2,1)
ret <- c(1,-2,4,6,9,-5)
vol <- c(3,5,1,6,2,3)
df <- data.frame(ticker,date,ret,vol)

I will want for each ticker and each date, I'd like to calculate vol ret. (what I want is much more complicated than this -- I just want to apply a function to vol and ret).

My function is:

get_rv <- function(data) {
  return(data[['vol']]   data[['ret']])
}

What I want is:

ticker_wanted <- c('A','A', 'B', 'B')
date_wanted <- c(1,2,1,2)
rv_wanted <- c(7,5,10,11)
df_wanted <-data.frame(ticker_wanted,date_wanted,rv_wanted)

I know how to do this in python: I just write a function and run df.groupby(['ticker','date']).apply(function). However, I don't know how to do this in R.

Could somebody help out please?

Thank you!

Best,

Darcy

CodePudding user response:

In your example, you can do:

my_function <- function(data) {
  data %>%
    summarize(rv = sum(ret, vol))
}

library(tidyverse)
df %>%
  group_by(ticker, date) %>%
  my_function()

# A tibble: 4 x 3
# Groups:   ticker [2]
  ticker  date    rv
  <chr>  <dbl> <dbl>
1 A          1     7
2 A          2     5
3 B          1    10
4 B          2    11

But as mentioned in my comment, I‘m not sure if this general example would help in your real-life use case.

Might also be that you don‘t need to create your own function because built-in functions already exist. Like in the example, you sre better off with directly summarizing instead of wrapping it into a function.

CodePudding user response:

you could just do this? (with summarise as an example of your function):

ticker <- c("A", "A", 'A', "B", "B", "B")
date <- c(1,1,2,1,2,1)
ret <- c(1,-2,4,6,9,-5)
vol <- c(3,5,1,6,2,3)
df <- data.frame(ticker,date,ret,vol)

df_wanted <- get_rv(df)

get_rv <- function(data){
  result <- data %>%
    group_by(ticker,date) %>%
    summarise(rv =sum(ret)   sum(vol)) %>%
    as.data.frame()
  names(result) <- c('ticker_wanted', 'date_wanted', 'rv_wanted')
  return(result)
}    
  • Related