Is it possible to remove rows of data by referencing specific character strings or factor levels from 2 or more columns? For small datasets, this is easy because I can just scroll through the dataframe and remove the row I want, but how could this be achieved for larger datasets without endlessly scrolling to see which rows match my criteria?
Fake data:
df1 <- data.frame(year = rep(c(2019, 2020), each = 10),
month = rep(c("March", "October"), each = 1),
site = rep(c("1", "2", "3", "4", "5"), each = 2),
common_name = rep(c("Tuna", "shark"), each = 1),
num = sample(x = 0:2, size = 20, replace = TRUE))
For example: How do I remove only site "1" in March of 2019 in one line of code and without looking at which row it's in?
CodePudding user response:
You can use subset()
:
df1 <- data.frame(year = rep(c(2019, 2020), each = 10),
month = rep(c("March", "October"), each = 1),
site = rep(c("1", "2", "3", "4", "5"), each = 2),
common_name = rep(c("Tuna", "shark"), each = 1),
num = sample(x = 0:2, size = 20, replace = TRUE))
subset(df1, !(site == "1" & year == 2019 & month == "March"))
#> year month site common_name num
#> 2 2019 October 1 shark 0
#> 3 2019 March 2 Tuna 1
#> 4 2019 October 2 shark 0
#> 5 2019 March 3 Tuna 0
#> 6 2019 October 3 shark 0
#> 7 2019 March 4 Tuna 2
#> 8 2019 October 4 shark 2
#> 9 2019 March 5 Tuna 0
#> 10 2019 October 5 shark 2
#> 11 2020 March 1 Tuna 1
#> 12 2020 October 1 shark 1
#> 13 2020 March 2 Tuna 2
#> 14 2020 October 2 shark 2
#> 15 2020 March 3 Tuna 1
#> 16 2020 October 3 shark 0
#> 17 2020 March 4 Tuna 1
#> 18 2020 October 4 shark 0
#> 19 2020 March 5 Tuna 0
#> 20 2020 October 5 shark 2
Created on 2022-05-31 by the reprex package (v2.0.1)
CodePudding user response:
We could use paste
as well
subset(df1, paste(year, month, site) != '2019 March 1')
-output
year month site common_name num
2 2019 October 1 shark 1
3 2019 March 2 Tuna 1
4 2019 October 2 shark 2
5 2019 March 3 Tuna 0
6 2019 October 3 shark 0
7 2019 March 4 Tuna 2
8 2019 October 4 shark 1
9 2019 March 5 Tuna 1
10 2019 October 5 shark 1
11 2020 March 1 Tuna 1
12 2020 October 1 shark 1
13 2020 March 2 Tuna 1
14 2020 October 2 shark 2
15 2020 March 3 Tuna 1
16 2020 October 3 shark 0
17 2020 March 4 Tuna 1
18 2020 October 4 shark 1
19 2020 March 5 Tuna 1
20 2020 October 5 shark 2
CodePudding user response:
A one line alternative to subset
or dplyr:filter
using the R bracket notation:
df2 <- df1[!(df1$site=="1" & df1$year==2019 & df1$month=="March"),]