I have a data frame df
with multiple columns, lets say x,y and z
.
x | y | z |
---|---|---|
43 | 5666 | 654 |
54 | 545 | 645 |
43 | 7864 | 654 |
25 | 65654 | 987 |
21 | 5445 | 789 |
67 | 98 | 89 |
25 | 64 | 986 |
78 | 6465 | 68 |
21 | 546 | 64 |
Now I need only those rows for which the values in column x appear twice
x | y | z |
---|---|---|
43 | 5666 | 654 |
43 | 7864 | 654 |
25 | 65654 | 987 |
21 | 5445 | 789 |
25 | 64 | 986 |
21 | 546 | 64 |
I have been looking around but found a code for python but not R.
CodePudding user response:
A dplyr
way:
set.seed(100)
library(dplyr)
df <- data.frame(x = sample(1:5, 10, T),
y = sample(10:15, 10, T),
z = sample(20:25, 10, T))
df
#> x y z
#> 1 2 13 23
#> 2 3 12 24
#> 3 1 12 22
#> 4 2 11 20
#> 5 4 10 22
#> 6 4 11 23
#> 7 2 12 21
#> 8 3 13 25
#> 9 2 13 24
#> 10 5 15 24
df |>
group_by(x) |>
filter(n() == 2)
#> # A tibble: 4 × 3
#> # Groups: x [2]
#> x y z
#> <int> <int> <int>
#> 1 3 12 24
#> 2 4 10 22
#> 3 4 11 23
#> 4 3 13 25
Created on 2022-10-03 with reprex v2.0.2
CodePudding user response:
Using ave
.
dat[with(dat, ave(x, x, FUN=length) == 2), ]
# x y z
# 1 43 5666 654
# 3 43 7864 654
# 4 25 65654 987
# 5 21 5445 789
# 7 25 64 986
# 9 21 546 64
Data:
dat <- structure(list(x = c(43L, 54L, 43L, 25L, 21L, 67L, 25L, 78L,
21L), y = c(5666L, 545L, 7864L, 65654L, 5445L, 98L, 64L, 6465L,
546L), z = c(654L, 645L, 654L, 987L, 789L, 89L, 986L, 68L, 64L
)), class = "data.frame", row.names = c(NA, -9L))
CodePudding user response:
set.seed(100)
df <- data.frame(x = sample(1:5, 10, T),
y = sample(10:15, 10, T),
z = sample(20:25, 10, T))
## n -> rows with values appeared twice
n <- which(df$x %in% as.numeric(names(which(table(df$x) == 2))))
df[n,]