Home > OS >  Create dummy variable based on if df1 rowdate is between two dates in df2 and df1 class match df2 cl
Create dummy variable based on if df1 rowdate is between two dates in df2 and df1 class match df2 cl

Time:11-11

I have two dataframes. df1 with several hundred thousand rows each indicating a parliamentary speech. These have columns that indicate the date of the speech, the party, speaker etc.

df1 looks like this:

date         speechnumber                speaker party party.facts.id chair terms   parliament iso3country year pyear
1 1997-10-07            3   Poul Nyrup Rasmussen     S            379 FALSE 18662 DK-Folketing         DNK 1997     1
2 1997-10-09            3            Torben Lund     S            379 FALSE  2865 DK-Folketing         DNK 1997     1
3 1997-10-09            5             Frank Aaen    EL           1527 FALSE   159 DK-Folketing         DNK 1997     1
4 1997-10-09            6         Pia Kjærsgaard    DF           1022 FALSE   195 DK-Folketing         DNK 1997     1
5 1997-10-09            7 Kristian Thulesen Dahl    DF           1022 FALSE   202 DK-Folketing         DNK 1997     1
6 1997-10-09           10         Jette Gottlieb    EL           1527 FALSE   179 DK-Folketing         DNK 1997     1

Then I have created another one (df2) where I have specified which parties formed government at what times. I have made this into long format (not sure whether this is best), where 1 indicates that party x was in government in the period from "start_date" to "end_date".

df2 looks like this:

    start_date   end_date party values
1  1996-12-30 1998-03-23    RV      1
2  1998-03-23 2001-11-27    RV      1
3  2001-11-27 2005-02-18    RV      0
4  2005-02-18 2007-11-23    RV      0
5  2007-11-23 2009-04-05    RV      0
6  2009-04-05 2011-10-03    RV      0
7  2011-10-03 2014-02-03    RV      1
8  2014-02-03 2015-06-28    RV      1
9  2015-06-28 2016-11-28    RV      0
10 2016-11-28 2019-06-27    RV      0
11 2019-06-27 2020-10-27    RV      0
12 1996-12-30 1998-03-23     S      1
13 1998-03-23 2001-11-27     S      1
14 2001-11-27 2005-02-18     S      0
15 2005-02-18 2007-11-23     S      0
16 2007-11-23 2009-04-05     S      0
17 2009-04-05 2011-10-03     S      0
18 2011-10-03 2014-02-03     S      1
19 2014-02-03 2015-06-28     S      1
20 2015-06-28 2016-11-28     S      0
21 2016-11-28 2019-06-27     S      0
22 2019-06-27 2020-10-27     S      1
23 1996-12-30 1998-03-23     V      0
24 1998-03-23 2001-11-27     V      0
25 2001-11-27 2005-02-18     V      1
26 2005-02-18 2007-11-23     V      1
27 2007-11-23 2009-04-05     V      1
28 2009-04-05 2011-10-03     V      1
29 2011-10-03 2014-02-03     V      0
30 2014-02-03 2015-06-28     V      0
31 2015-06-28 2016-11-28     V      1
32 2016-11-28 2019-06-27     V      1
33 2019-06-27 2020-10-27     V      0
34 1996-12-30 1998-03-23    KF      0
35 1998-03-23 2001-11-27    KF      0
36 2001-11-27 2005-02-18    KF      1
37 2005-02-18 2007-11-23    KF      1
38 2007-11-23 2009-04-05    KF      1
39 2009-04-05 2011-10-03    KF      1
40 2011-10-03 2014-02-03    KF      0
41 2014-02-03 2015-06-28    KF      0
42 2015-06-28 2016-11-28    KF      0
43 2016-11-28 2019-06-27    KF      1
44 2019-06-27 2020-10-27    KF      0
45 1996-12-30 1998-03-23    SF      0
46 1998-03-23 2001-11-27    SF      0
47 2001-11-27 2005-02-18    SF      0
48 2005-02-18 2007-11-23    SF      0
49 2007-11-23 2009-04-05    SF      0
50 2009-04-05 2011-10-03    SF      0
51 2011-10-03 2014-02-03    SF      1
52 2014-02-03 2015-06-28    SF      0
53 2015-06-28 2016-11-28    SF      0
54 2016-11-28 2019-06-27    SF      0
55 2019-06-27 2020-10-27    SF      0
56 1996-12-30 1998-03-23    LA      0
57 1998-03-23 2001-11-27    LA      0
58 2001-11-27 2005-02-18    LA      0
59 2005-02-18 2007-11-23    LA      0
60 2007-11-23 2009-04-05    LA      0
61 2009-04-05 2011-10-03    LA      0
62 2011-10-03 2014-02-03    LA      0
63 2014-02-03 2015-06-28    LA      0
64 2015-06-28 2016-11-28    LA      0
65 2016-11-28 2019-06-27    LA      1
66 2019-06-27 2020-10-27    LA      0

What I want is to create a dummy variable in df1 indicating whether party x was in government at the time of the speech. I have tried so many different ways but I cannot seem to figure it out. Can anyone help?

CodePudding user response:

library(data.table)
setDT(df1); setDT(df2)
# set dates to actual date-format (if not already)
df1[, date := as.Date(date)]
df2[, start_date := as.Date(start_date)]
df2[, end_date := as.Date(end_date)]
# initialise the 'ingov' column to "no"
df1[, ingov := "no"]
df1[df2[values == 1,], 
    ingov := "yes", 
    on = .(party, date >= start_date, date < end_date)]
#          date                speaker party year pyear ingov
# 1: 1997-10-07   Poul Nyrup Rasmussen     S 1997     1   yes
# 2: 1997-10-09            Torben Lund     S 1997     1   yes
# 3: 1997-10-09             Frank Aaen    EL 1997     1    no
# 4: 1997-10-09         Pia Kjærsgaard    DF 1997     1    no
# 5: 1997-10-09 Kristian Thulesen Dahl    DF 1997     1    no
# 6: 1997-10-09         Jette Gottlieb    EL 1997     1    no
  • Related