I have a dataframe as it follows
Id | Code | year
1 | ZZZ | 2016
1 | KKK | 2016
1 | A23 | 2018
2 | A01 | 2018
2 | KKK | 2016
2 | ddd | 2017
3 | KKK | 2016
3 | ZZZ | 2016
4 | A23 | 2018
4 | 000 | 2018
5 | 009 | 2018
What I need is a table where for each id the table has a count of how many codes has at least a duplicated value in the df, separated by year. This should be an example af the output based on the df shown above.
Id | 2016 | 2017 | 2018 |
1 | 2 | 0 | 1 |
2 | 1 | 0 | 0 |
3 | 2 | 0 | 0 |
4 | 0 | 0 | 1 |
CodePudding user response:
dat <- structure(list(Id = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5), Code = c("ZZZ",
"KKK", "A23", "A01", "KKK", "ddd", "KKK", "ZZZ", "A23", "000",
"009"), year = c(2016L, 2016L, 2018L, 2018L, 2016L, 2017L, 2016L,
2016L, 2018L, 2018L, 2018L)), row.names = c(NA, -11L), class = "data.frame")
You can use:
## this helps to keep 2017 column
dat$Year <- as.factor(dat$year)
## keep Code that appears more than once then form contingency table
table(subset(dat, duplicated(Code) | duplicated(Code, fromLast = T), c(Id, Year)))
# Year
#Id 2016 2017 2018
# 1 2 0 1
# 2 1 0 0
# 3 2 0 0
# 4 0 0 1
CodePudding user response:
You could use xtabs
function:
dat$year = factor(dat$year)
# tabulate Id and Year for duplicated Code values
xtabs( ~Id year, dat, subset=duplicated(Code) | duplicated(Code, fromLast=TRUE))
year
Id 2016 2017 2018
1 2 0 1
2 1 0 0
3 2 0 0
4 0 0 1