Home > database >  How do I create dummy variables for specific rows/observations in a dataframe?
How do I create dummy variables for specific rows/observations in a dataframe?

Time:12-31

I am doing question C12(iii) in chapter 9 of Wooldridge's Introductory Econometrics: A Modern Approach. The question asks the reader to first identify all observations for which the variable 'bs' is greater than 0.5. The question then asks the reader to assign a dummy variable to each of these observations for use in a regression.

I performed the first part of the question (identifying all observations for which 'bs' is greater than 0.5 using the following code:

library('wooldridge')
which(elem94_95$bs>0.5)
[1]   68 1127 1508 1670

After looking at the table this produces in rStudio, I find that the relevant rows/observations are 68; 1,127; 1,508; and 1,670.

I would like to create a dummy variable for each of these rows/observations, i.e., 'd68'; 'd1127'; 'd1508'; and 'd1670'. How do I do this? My intuitive first attempt solution was the following:

elem94_95$d68<-ifelse(row==68,1,0)

However, this does not work.

CodePudding user response:

I've come up with the following solution:

elem94_95$rownumber<-1:nrow(elem94_95)
elem94_95$d68<-ifelse(elem94_95$rownumber==68,1,0)
elem94_95$d1127<-ifelse(elem94_95$rownumber==1127,1,0)
elem94_95$d1508<-ifelse(elem94_95$rownumber==1508,1,0)
elem94_95$d1670<-ifelse(elem94_95$rownumber==1670,1,0)

However, it feels inelegant. If anyone else has a way to directly include row numbers in a formula I would welcome that solution instead.

CodePudding user response:

library(tidyverse)

df <- elem94_95 %>%  
  as_tibble() %>% 
  mutate(row = row_number(), 
         dummy = if_else(
           bs > 0.5, str_c("d", row), NA_character_
         )) 

df %>%  
  filter(!is.na(dummy))

# A tibble: 4 × 16
  distid schid lunch enrol staff exppp avgsal avgben math4 story4    bs lavgsal lenrol lstaff   row dummy
   <dbl> <int> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>  <dbl> <dbl>   <dbl>  <dbl>  <dbl> <int> <chr>
1   9030   192 40.7    167  85.6  3584  24425  16108  67.9   71.4 0.659   10.1    5.12   4.45    68 d68  
2  63160  5783  3.60   411 115.   5394  30304  17418  83.9   92.9 0.575   10.3    6.02   4.75  1127 d1127
3  82010   701 69.4    896  78.3  1353   9297   9295  41.2   48.5 1.00     9.14   6.80   4.36  1508 d1508
4  82040  5357 32.9    304  49.9  3532  50042  25134  57.6   55.9 0.502   10.8    5.72   3.91  1670 d1670
  • Related