I use the code:
student_data1 <- student_data1[!(student_data1$gvkey == 6310),] %>%
head()
to delete the company with the gvkey 6310, but it is deleting everything else and keeps 6310.
How do I need to change the code and how would the code look like if I want to delete 6310 and 9555?
Thank you in advance! :)
CodePudding user response:
It’s always helpful when we have data we can look at to know for sure what’s going on. In the future you can share some of your data by using something like dput(head(student_data1, 10))
and then copy and paste the output of that into your question. We'll generate some data to show an example here.
student_data1 <-
data.frame(
gvkey = rep(c(6310 , 9555, 2222, 11, 2), each = 10),
Var1 = rnorm(50)
)
head(student_data1, 5)
#> gvkey Var1
#> 1 6310 0.065167828
#> 2 6310 0.334672998
#> 3 6310 -0.459434631
#> 4 6310 -0.002706843
#> 5 6310 0.596642565
nrow(student_data1)
#> [1] 50
From the code you’ve posted, it looks like it should give you want you want for just removing gvkey
6310 with the syntax you've used, although generally we would use !=
instead of !(==)
. The only thing I can speculate is perhaps you've missed the !
in your actual script.
df <- student_data1[!(student_data1$gvkey == 6310) , ]
head(df, 5)
#> gvkey Var1
#> 11 9555 -0.1338284
#> 12 9555 -3.4963800
#> 13 9555 0.7090384
#> 14 9555 -0.5466933
#> 15 9555 -1.5392845
nrow(df)
#> [1] 40
To remove multiple values it’s often easiest to use the %in%
operator.
df <- student_data1[!student_data1$gvkey %in% c(6310, 9555) , ]
head(df, 5)
#> gvkey Var1
#> 21 2222 2.9606101
#> 22 2222 0.7001521
#> 23 2222 0.1065952
#> 24 2222 0.7103071
#> 25 2222 -0.3279968
nrow(df)
#> [1] 30
Created on 2021-12-08 by the reprex package (v2.0.1)