Home > Net >  How do I calculate changes in panel data in R
How do I calculate changes in panel data in R

Time:09-02

I have a panel data set with two waves. I want to calculate the (possible) change from wave 1 to wave 2 of a selected variable and add the result as a new variable to the dataset. Two challenges: There are a considerable number of respondents who participated either only in the first wave or only in the second wave. Also, there are many NAs, which together make the use of lag or diff function difficult I think.

Example:

RespondentID <- c("id_001","id_002","id_002","id_003", "id_004", "id_004", "id_005", "id_005")
Wave <- c(1,1,2,2,1,2,1,2)
Value <- c(2,2,3,4,3,1,NA,3)
panel <- cbind(RespondentID,Wave,Value)

panel

 RespondentID Wave Value
[1,] "id_001"     "1"  "2"  
[2,] "id_002"     "1"  "2"  
[3,] "id_002"     "2"  "3"  
[4,] "id_003"     "2"  "4"  
[5,] "id_004"     "1"  "3"  
[6,] "id_004"     "2"  "1"  
[7,] "id_005"     "1"  NA   
[8,] "id_005"     "2"  "3"  

The result should look like this:

     RespondentID Wave Value Change
[1,] "id_001"     "1"  "2"   NA    
[2,] "id_002"     "1"  "2"   NA    
[3,] "id_002"     "2"  "3"   "1"   
[4,] "id_003"     "2"  "4"   NA    
[5,] "id_004"     "1"  "3"   NA    
[6,] "id_004"     "2"  "1"   "-2"  
[7,] "id_005"     "1"  NA    NA    
[8,] "id_005"     "2"  "3"   NA   

CodePudding user response:

You could use a data structure that recognises the panel data property of your data and "panel-aware" functions. pdata.frame in package plm is one candidate. See the example below, picking up from your example (but putting the various variables via data.frame together first (your approach with cbind lumps the columns together but converts to a character)):

panel <- data.frame(RespondentID,Wave,Value)

library(plm)
pdata <- pdata.frame(panel, index = c("RespondentID", "Wave"))
pdata$Change <- diff(pdata$Value)
pdata
#>          RespondentID Wave Value Change
#> id_001-1       id_001    1     2     NA
#> id_002-1       id_002    1     2     NA
#> id_002-2       id_002    2     3      1
#> id_003-2       id_003    2     4     NA
#> id_004-1       id_004    1     3     NA
#> id_004-2       id_004    2     1     -2
#> id_005-1       id_005    1    NA     NA
#> id_005-2       id_005    2     3     NA

The diff on a pdata.frame's column is a panel-aware diff.

CodePudding user response:

First I create the example data using the data.frame function instead of cbind since I would like to have the 'Value' column to have numeric values.

library(data.table)
RespondentID <- c("id_001","id_002","id_002","id_003", "id_004", "id_004", "id_005", "id_005")
Wave <- c(1,1,2,2,1,2,1,2)
Value <- c(2,2,3,4,3,1,NA,3)
panel <- data.frame(RespondentID,Wave,Value)

panel

I create a new panel by reshaping the panel data into a wide format.

new_panel <- as.data.table(panel)
new_panel <- dcast(new_panel,
                   RespondentID~Wave,
                   fill = NA_integer_,
                   value.var = c("Value"))

I change the column names to something meaningful

colnames(new_panel)[2:3] <- c("Value_1", "Value_2")

The transformed data looks like this.

new_panel
RespondentID Value_1 Value_2
1:       id_001       2      NA
2:       id_002       2       3
3:       id_003      NA       4
4:       id_004       3       1
5:       id_005      NA       3

Now I create the Change column and the result looks like below.

new_panel[,Change:=Value_2 - Value_1]
new_panel
  RespondentID Value_1 Value_2 Change
1:       id_001       2      NA     NA
2:       id_002       2       3      1
3:       id_003      NA       4     NA
4:       id_004       3       1     -2
5:       id_005      NA       3     NA

I hope this helps.

  • Related