I have a dataframe consisting of surveys that are done for specific subjects on specific days. Looking something like this:
subject_id | day | data |
---|---|---|
01 | 1 | 34 |
02 | 1 | 54 |
03 | 3 | 55 |
04 | 4 | 56 |
However, sometimes the survey is done two times in a day, resulting in duplicate Subject_id and Day values, but different data (in reality the data spans multiple collumns).
subject_id | day | data |
---|---|---|
01 | 1 | 34 |
01 | 1 | 58 |
02 | 1 | 54 |
03 | 3 | 55 |
04 | 4 | 56 |
The dataframe is sorted by time, such that the second row represents the second survey on day 1 for subject 1. What I would like to do is to add a column called name that adds "- 2" to every second duplicate subject_id and day and adds nothing (or "- 1" is also acceptable) for non duplicates. In the end the data should like this.
subject_id | day | data | name |
---|---|---|---|
01 | 1 | 34 | 1 |
01 | 1 | 58 | 1 -2 |
02 | 1 | 54 | 1 |
03 | 3 | 55 | 3 |
04 | 4 | 56 | 4 |
I have tried using dplyr and janitor to mark the duplicates. However after many hours I still failed to make the new column as described.
CodePudding user response:
library(data.table)
library(tidyverse)
df <- fread("subject_id day data
01 1 34
01 1 58
02 1 54
03 3 55
04 4 56") %>%
tibble()
df
#> # A tibble: 5 × 3
#> subject_id day data
#> <int> <int> <int>
#> 1 1 1 34
#> 2 1 1 58
#> 3 2 1 54
#> 4 3 3 55
#> 5 4 4 56
df %>%
group_by(subject_id) %>%
mutate(name = ifelse(order(day) == 1,
as.character(order(day)),
paste(day, "-", order(day)))
)
#> # A tibble: 5 × 4
#> # Groups: subject_id [4]
#> subject_id day data name
#> <int> <int> <int> <chr>
#> 1 1 1 34 1
#> 2 1 1 58 1 - 2
#> 3 2 1 54 1
#> 4 3 3 55 1
#> 5 4 4 56 1
Created on 2022-06-23 by the reprex package (v2.0.1)
CodePudding user response:
Try this
df |> group_by(subject_id , day) |>
mutate(name = ifelse( duplicated(day) , paste0(day , "-" , cumsum(duplicated(day)) 1)
, as.character(day)))
CodePudding user response:
If survey
are your data as data.frame
duplicatedresults<-duplicated(surveys$subject_id)
surveys$name[duplicatedresults]<-paste0(surveys$subject_id[duplicatedresults],"-2")
CodePudding user response:
df %>%
group_by(day) %>%
mutate(name = janitor::make_clean_names(subject_id, use_make_names = FALSE))
# A tibble: 5 x 4
# Groups: day [3]
subject_id day data name
<int> <int> <int> <chr>
1 1 1 34 1
2 1 1 58 1_2
3 2 1 54 2
4 3 3 55 3
5 4 4 56 4