Suppose I have the following dataset:
Year | Country | Tax |
---|---|---|
2000 | Austria | 5 |
2005 | Belgium | 21 |
2006 | Austria | 10 |
2001 | Austria | 5 |
2002 | Austria | 6 |
2006 | Belgium | 22 |
I want to do attain something like this:
Year | Country | Tax | Dummy |
---|---|---|---|
2000 | Austria | 5 | 0 |
2005 | Belgium | 21 | 0 |
2006 | Austria | 10 | 0 |
2001 | Austria | 5 | 0 |
2002 | Austria | 6 | 1 |
2006 | Belgium | 22 | 1 |
So I want a dummy if there was a tax increase relative to the country's previous year. If there is no info regarding the year before, then set it to zero.
Note this is a large dataset and does not only apply for two
Thanks in advance!
Im not quite sure where and how to start unfortunately
CodePudding user response:
library(tidyverse)
df %>%
arrange(Country, Year) %>%
group_by(Country) %>%
mutate(Dummy = case_when(Tax > lag(Tax) &
Year == lag(Year) 1 ~ 1,
TRUE ~ 0))
# A tibble: 6 × 4
# Groups: Country [2]
Year Country Tax Dummy
<dbl> <chr> <dbl> <dbl>
1 2000 Austria 5 0
2 2001 Austria 5 0
3 2002 Austria 6 1
4 2006 Austria 10 0
5 2005 Belgium 21 0
6 2006 Belgium 22 1
CodePudding user response:
Create a dataframe of same size but lagged and compare with the original
df=df[order(df$Country,df$Year),]
df2=df[
ave(
row.names(df),
df$Country,
FUN=function(x){
c(head(x,1),head(x,-1))
}
),
]
df$Dummy=(df$Year==df2$Year 1 & df$Country==df2$Country & df$Tax>df2$Tax)*1L
Year Country Tax rnk Dummy
1 2000 Austria 5 1 0
4 2001 Austria 5 2 0
5 2002 Austria 6 3 1
3 2006 Austria 10 4 0
2 2005 Belgium 21 1 0
6 2006 Belgium 22 2 1