I have a database of animals that have been tested. Animals are grouped in herds. Each herd can be tested multiple times. I would like to make a new column that will tell me if it was the first time or the second that this herd has been tested.
This is an example of my database:
df <- data.frame(
animal = c("Animal1", "Animal2", "Animal3", "Animal1", "Animal2", "Animal3", "Animal4", "Animal5", "Animal6", "Animal4", "Animal5", "Animal6"),
herd = c("Herd1","Herd1","Herd1", "Herd1","Herd1","Herd1","Herd2","Herd2", "Herd2","Herd2","Herd2","Herd2"),
date = c("2017-01-01", "2017-01-01", "2017-01-01", "2018-07-01" , "2018-07-01", "2018-07-01", "2017-05-01", "2017-05-01", "2017-05-01", "2019-07-01", "2019-07-01", "2019-07-01"))
So, i would like it to look like this
animal herd date testing
1 Animal1 Herd1 2017-01-01 1
2 Animal2 Herd1 2017-01-01 1
3 Animal3 Herd1 2017-01-01 1
4 Animal1 Herd1 2018-07-01 2
5 Animal2 Herd1 2018-07-01 2
6 Animal3 Herd1 2018-07-01 2
7 Animal4 Herd2 2017-05-01 1
8 Animal5 Herd2 2017-05-01 1
9 Animal6 Herd2 2017-05-01 1
10 Animal4 Herd2 2019-07-01 2
11 Animal5 Herd2 2019-07-01 2
12 Animal6 Herd2 2019-07-01 2
I've tried this but is not exactly what I would like, and gets really messy with the whole database
df <- df %>%
group_by(herd) %>%
mutate(testing = rank(date))
> df
# A tibble: 12 x 4
# Groups: herd [2]
animal herd date testing
<fct> <fct> <fct> <dbl>
1 Animal1 Herd1 2017-01-01 2
2 Animal2 Herd1 2017-01-01 2
3 Animal3 Herd1 2017-01-01 2
4 Animal1 Herd1 2018-07-01 5
5 Animal2 Herd1 2018-07-01 5
6 Animal3 Herd1 2018-07-01 5
7 Animal4 Herd2 2017-05-01 2
8 Animal5 Herd2 2017-05-01 2
9 Animal6 Herd2 2017-05-01 2
10 Animal4 Herd2 2019-07-01 5
11 Animal5 Herd2 2019-07-01 5
12 Animal6 Herd2 2019-07-01 5
Thanks for your help!
CodePudding user response:
You can use dplyr::dense_rank
:
df %>%
group_by(herd) %>%
mutate(testing = dense_rank(date))
output
animal herd date testing
<chr> <chr> <chr> <int>
1 Animal1 Herd1 2017-01-01 1
2 Animal2 Herd1 2017-01-01 1
3 Animal3 Herd1 2017-01-01 1
4 Animal1 Herd1 2018-07-01 2
5 Animal2 Herd1 2018-07-01 2
6 Animal3 Herd1 2018-07-01 2
7 Animal4 Herd2 2017-05-01 1
8 Animal5 Herd2 2017-05-01 1
9 Animal6 Herd2 2017-05-01 1
10 Animal4 Herd2 2019-07-01 2
11 Animal5 Herd2 2019-07-01 2
12 Animal6 Herd2 2019-07-01 2
CodePudding user response:
A one-liner using seq
in ave
.
transform(df, testing=ave(date, herd, animal, FUN=seq.int))
# animal herd date testing
# 1 Animal1 Herd1 2017-01-01 1
# 2 Animal2 Herd1 2017-01-01 1
# 3 Animal3 Herd1 2017-01-01 1
# 4 Animal1 Herd1 2018-07-01 2
# 5 Animal2 Herd1 2018-07-01 2
# 6 Animal3 Herd1 2018-07-01 2
# 7 Animal4 Herd2 2017-05-01 1
# 8 Animal5 Herd2 2017-05-01 1
# 9 Animal6 Herd2 2017-05-01 1
# 10 Animal4 Herd2 2019-07-01 2
# 11 Animal5 Herd2 2019-07-01 2
# 12 Animal6 Herd2 2019-07-01 2
CodePudding user response:
You could use data.table
like this:
library("data.table")
setDT(df)
df[,testing := .N, by = list(animal,herd)][,testing := seq_along(testing), by = list(animal,herd)]
Or in another way:
library("data.table")
setDT(df)
df[,testing := NA][,testing := seq_along(testing), by = list(animal,herd)]
In both cases you use the data.table
package, its dt[i,j,by]
syntax and its chaining usage dt[first][second]