I am doing question C12(iii) in chapter 9 of Wooldridge's Introductory Econometrics: A Modern Approach. The question asks the reader to first identify all observations for which the variable 'bs' is greater than 0.5. The question then asks the reader to assign a dummy variable to each of these observations for use in a regression.
I performed the first part of the question (identifying all observations for which 'bs' is greater than 0.5 using the following code:
library('wooldridge')
which(elem94_95$bs>0.5)
[1] 68 1127 1508 1670
After looking at the table this produces in rStudio, I find that the relevant rows/observations are 68; 1,127; 1,508; and 1,670.
I would like to create a dummy variable for each of these rows/observations, i.e., 'd68'; 'd1127'; 'd1508'; and 'd1670'. How do I do this? My intuitive first attempt solution was the following:
elem94_95$d68<-ifelse(row==68,1,0)
However, this does not work.
CodePudding user response:
I've come up with the following solution:
elem94_95$rownumber<-1:nrow(elem94_95)
elem94_95$d68<-ifelse(elem94_95$rownumber==68,1,0)
elem94_95$d1127<-ifelse(elem94_95$rownumber==1127,1,0)
elem94_95$d1508<-ifelse(elem94_95$rownumber==1508,1,0)
elem94_95$d1670<-ifelse(elem94_95$rownumber==1670,1,0)
However, it feels inelegant. If anyone else has a way to directly include row numbers in a formula I would welcome that solution instead.
CodePudding user response:
library(tidyverse)
df <- elem94_95 %>%
as_tibble() %>%
mutate(row = row_number(),
dummy = if_else(
bs > 0.5, str_c("d", row), NA_character_
))
df %>%
filter(!is.na(dummy))
# A tibble: 4 × 16
distid schid lunch enrol staff exppp avgsal avgben math4 story4 bs lavgsal lenrol lstaff row dummy
<dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <chr>
1 9030 192 40.7 167 85.6 3584 24425 16108 67.9 71.4 0.659 10.1 5.12 4.45 68 d68
2 63160 5783 3.60 411 115. 5394 30304 17418 83.9 92.9 0.575 10.3 6.02 4.75 1127 d1127
3 82010 701 69.4 896 78.3 1353 9297 9295 41.2 48.5 1.00 9.14 6.80 4.36 1508 d1508
4 82040 5357 32.9 304 49.9 3532 50042 25134 57.6 55.9 0.502 10.8 5.72 3.91 1670 d1670