Home > other >  Deleting everything but relevant Data
Deleting everything but relevant Data

Time:12-10

I use the code:

student_data1 <- student_data1[!(student_data1$gvkey == 6310),] %>%

head()

to delete the company with the gvkey 6310, but it is deleting everything else and keeps 6310.

How do I need to change the code and how would the code look like if I want to delete 6310 and 9555?

Thank you in advance! :)

CodePudding user response:

It’s always helpful when we have data we can look at to know for sure what’s going on. In the future you can share some of your data by using something like dput(head(student_data1, 10)) and then copy and paste the output of that into your question. We'll generate some data to show an example here.

student_data1 <-
  data.frame(
    gvkey = rep(c(6310 , 9555, 2222, 11, 2), each = 10),
    Var1 = rnorm(50)
  )

head(student_data1, 5)
#>   gvkey         Var1
#> 1  6310  0.065167828
#> 2  6310  0.334672998
#> 3  6310 -0.459434631
#> 4  6310 -0.002706843
#> 5  6310  0.596642565

nrow(student_data1)
#> [1] 50

From the code you’ve posted, it looks like it should give you want you want for just removing gvkey 6310 with the syntax you've used, although generally we would use != instead of !(==). The only thing I can speculate is perhaps you've missed the ! in your actual script.

df <- student_data1[!(student_data1$gvkey == 6310) , ]

head(df, 5)
#>    gvkey       Var1
#> 11  9555 -0.1338284
#> 12  9555 -3.4963800
#> 13  9555  0.7090384
#> 14  9555 -0.5466933
#> 15  9555 -1.5392845

nrow(df)
#> [1] 40

To remove multiple values it’s often easiest to use the %in% operator.

df <- student_data1[!student_data1$gvkey %in% c(6310, 9555) , ]

head(df, 5)
#>    gvkey       Var1
#> 21  2222  2.9606101
#> 22  2222  0.7001521
#> 23  2222  0.1065952
#> 24  2222  0.7103071
#> 25  2222 -0.3279968

nrow(df)
#> [1] 30

Created on 2021-12-08 by the reprex package (v2.0.1)

  • Related