I am trying to get all dupicated observations. I was looking but all solutions seems to give for columns. Is it possible get the entire rows?
My dataset looks like this
structure(list(CrimeId = c(160903280L, 160912272L, 160912590L,
160912801L, 160912811L, 160913003L), OriginalCrimeTypeName = c("Assault / Battery",
"Homeless Complaint", "Susp Info", "Report", "594", "Ref'd"),
OffenseDate = c("2016-03-30T00:00:00", "2016-03-31T00:00:00",
"2016-03-31T00:00:00", "2016-03-31T00:00:00", "2016-03-31T00:00:00",
"2016-03-31T00:00:00"), CallTime = c("18:42", "15:31", "16:49",
"17:38", "17:42", "18:29"), CallDateTime = c("2016-03-30T18:42:00",
"2016-03-31T15:31:00", "2016-03-31T16:49:00", "2016-03-31T17:38:00",
"2016-03-31T17:42:00", "2016-03-31T18:29:00"), Disposition = c("REP",
"GOA", "GOA", "GOA", "REP", "GOA"), Address = c("100 Block Of Chilton Av",
"2300 Block Of Market St", "2300 Block Of Market St", "500 Block Of 7th St",
"Beale St/bryant St", "16th St/pond St"), City = c("San Francisco",
"San Francisco", "San Francisco", "San Francisco", "San Francisco",
"San Francisco"), State = c("CA", "CA", "CA", "CA", "CA",
"CA"), AgencyId = c("1", "1", "1", "1", "1", "1"), Range = c(NA,
NA, NA, NA, NA, NA), AddressType = c("Premise Address", "Premise Address",
"Premise Address", "Premise Address", "Intersection", "Intersection"
)), row.names = c(NA, 6L), class = "data.frame")
CodePudding user response:
With dplyr
try group_by_all
or the now recommended group_by(across(everything()))
equivalent. Using a slightly extended data set where I created a duplicated entry (row 2 and 5).
library(dplyr)
df %>%
group_by(across(everything())) %>%
mutate(dup = n())
...AgencyId Range AddressType dup
... <chr> <lgl> <chr> <int>
...1 1 NA Premise Address 1
...2 1 NA Premise Address 2
...3 1 NA Premise Address 1
...4 1 NA Premise Address 1
...5 1 NA Premise Address 2
...6 1 NA Intersection 1
...7 1 NA Intersection 1
(only showing the last 4 columns)
ext. data
df <- structure(list(CrimeId = c(160903280L, 160912272L, 160912590L,
160912801L, 160912272L, 160912811L, 160913003L), OriginalCrimeTypeName = c("Assault / Battery",
"Homeless Complaint", "Susp Info", "Report", "Homeless Complaint",
"594", "Ref'd"), OffenseDate = c("2016-03-30T00:00:00", "2016-03-31T00:00:00",
"2016-03-31T00:00:00", "2016-03-31T00:00:00", "2016-03-31T00:00:00",
"2016-03-31T00:00:00", "2016-03-31T00:00:00"), CallTime = c("18:42",
"15:31", "16:49", "17:38", "15:31", "17:42", "18:29"), CallDateTime = c("2016-03-30T18:42:00",
"2016-03-31T15:31:00", "2016-03-31T16:49:00", "2016-03-31T17:38:00",
"2016-03-31T15:31:00", "2016-03-31T17:42:00", "2016-03-31T18:29:00"
), Disposition = c("REP", "GOA", "GOA", "GOA", "GOA", "REP",
"GOA"), Address = c("100 Block Of Chilton Av", "2300 Block Of Market St",
"2300 Block Of Market St", "500 Block Of 7th St", "2300 Block Of Market St",
"Beale St/bryant St", "16th St/pond St"), City = c("San Francisco",
"San Francisco", "San Francisco", "San Francisco", "San Francisco",
"San Francisco", "San Francisco"), State = c("CA", "CA", "CA",
"CA", "CA", "CA", "CA"), AgencyId = c("1", "1", "1", "1", "1",
"1", "1"), Range = c(NA, NA, NA, NA, NA, NA, NA), AddressType = c("Premise Address",
"Premise Address", "Premise Address", "Premise Address", "Premise Address",
"Intersection", "Intersection")), row.names = c("1", "2", "3",
"4", "21", "5", "6"), class = "data.frame")
CodePudding user response:
With library(dplyr)
you can do your_data %>% add_count(across(everything()))
to add a count grouped by every column.
Demo:
mtcars[c(1, 1, 2, 3, 2, 3, 3), ] %>%
add_count(across(everything()))
# mpg cyl disp hp drat wt qsec vs am gear carb n
# 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 2
# 2 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 2
# 3 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 2
# 4 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3
# 5 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 2
# 6 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3
# 7 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3