Home > Net >  Dummy Variable manipulation
Dummy Variable manipulation

Time:01-31

Suppose I have the following dataset:

Year Country Tax
2000 Austria 5
2005 Belgium 21
2006 Austria 10
2001 Austria 5
2002 Austria 6
2006 Belgium 22

I want to do attain something like this:

Year Country Tax Dummy
2000 Austria 5 0
2005 Belgium 21 0
2006 Austria 10 0
2001 Austria 5 0
2002 Austria 6 1
2006 Belgium 22 1

So I want a dummy if there was a tax increase relative to the country's previous year. If there is no info regarding the year before, then set it to zero.

Note this is a large dataset and does not only apply for two

Thanks in advance!

Im not quite sure where and how to start unfortunately

CodePudding user response:

library(tidyverse)

df %>%
  arrange(Country, Year) %>%
  group_by(Country) %>%
  mutate(Dummy = case_when(Tax > lag(Tax) &
                             Year == lag(Year)   1 ~ 1,
                           TRUE ~ 0))

# A tibble: 6 × 4
# Groups:   Country [2]
   Year Country   Tax Dummy
  <dbl> <chr>   <dbl> <dbl>
1  2000 Austria     5     0
2  2001 Austria     5     0
3  2002 Austria     6     1
4  2006 Austria    10     0
5  2005 Belgium    21     0
6  2006 Belgium    22     1

CodePudding user response:

Create a dataframe of same size but lagged and compare with the original

df=df[order(df$Country,df$Year),]

df2=df[
  ave(
    row.names(df),
    df$Country,
    FUN=function(x){
      c(head(x,1),head(x,-1))
    }
  ),
]

df$Dummy=(df$Year==df2$Year 1 & df$Country==df2$Country & df$Tax>df2$Tax)*1L

  Year Country Tax rnk Dummy
1 2000 Austria   5   1     0
4 2001 Austria   5   2     0
5 2002 Austria   6   3     1
3 2006 Austria  10   4     0
2 2005 Belgium  21   1     0
6 2006 Belgium  22   2     1
  • Related