Home > Enterprise >  Pulling a range of Values based on date range AND name
Pulling a range of Values based on date range AND name

Time:05-23

I'm trying to calculate the average expenses of a particular site over a given period during which a client was active. The data held in 2 separate tables

df1=

ClientID DateStart DateEnd Site
1 2020-02-13 2020-02-15 Albury
2 2020-02-14 2020-02-15 wodonga

df2=

Site Date Expense
Albury 2020-02-13 4
Albury 2020-02-14 5
Albury 2020-02-15 7
Albury 2020-02-16 4
Albury 2020-02-17 4
Wodonga 2020-02-14 100
Wodonga 2020-02-15 120

I was hoping for this

ClientID DateStart DateEnd Site Avg_exp
1 2020-02-13 2020-02-16 Albury 5.3
2 2020-02-14 2020-02-15 wodonga 110

I'm using SQLDF in R, and have come up with following steps:

sqldf("select avg(Expense) from df1,df2 WHERE df2.Site= 'Albury' AND df2.Date >= df1.DateStart AND df2.Date <= df1.DateEnd")

This gives me the average expense for Albury for the period that Client 1 was active - which is what i would like to attach to ClientID 1.

To do this, I try to add a new column in df1 that returns the average expense if df1.Site = df2.Site

df1$Avg_exp = sqldf("select avg(Expense) from df1,df2 WHERE df2.Site= df1.Site AND df2.Date >= df1.DateStart AND df2.Date <= df1.DateEnd")

This just calculates the Average expense for the entire of df2.

I've also tried just putting out a new df where I have grouped the average expense by clientID, with the aim of then just joining it back to DF1

df3=sqldf("select avg(Expense) from df1,df2 WHERE df2.Site= df1.Site AND df2.Date >= df1.DateStart AND df2.Date <= df1.DateEnd GROUP BY df1.ClientID")

But I get the Error "Parameter 5 does not have length 2"

Any suggestions?

Thanks!

CodePudding user response:

What you actually need is to use a left join for the two tables. So you would get something like that:

SELECT df1.ClientId, df1.DateStart, df1.DateEnd, df1.Site, AVG(df1.Expense) as Avg_exp
FROM df1
LEFT JOIN df2 ON df2.Site = df1.Site
     WHERE df2.Date BETWEEN df1.DateStart AND df1.DateEnd
GROUP BY df1.ClientId, df1.DateStart, df1.DateEnd, df1.Site

Doing the above with a comparison over strings may not give you the expected result though. For example, 'wodonga' is not the same as 'Wodonga'.

  •  Tags:  
  • sql r
  • Related