I'm reading data from a csv. This is what my data looks like. There are some records that have the same label/question on different days. I want to add numbers to the repeated questions.
UserID Full Name DOB EncounterID Date Type label responses
1 John Smith 1-1-90 13 1-1-21 Intro Check Were you given any info? (null)
1 John Smith 1-1-90 13 1-2-21 Intro Check Were you given any info? no
1 John Smith 1-1-90 13 1-3-21 Intro Check Were you given any info? yes
2 Jane Doe 2-2-80 14 1-6-21 Intro Check Were you given any info? no
2 Jane Doe 2-2-80 14 1-6-21 Care Check By using this service.. no
2 Jane Doe 2-2-80 14 1-6-21 Out Check How satisfied are you? unsat
Desired output (I would like to add numbers to the repeated questions as you can see below):
UserID Full Name DOB EncounterID Date Type label responses
1 John Smith 1-1-90 13 1-1-21 Intro Check Were you given any info?1 (null)
1 John Smith 1-1-90 13 1-2-21 Intro Check Were you given any info?2 no
1 John Smith 1-1-90 13 1-3-21 Intro Check Were you given any info?3 yes
2 Jane Doe 2-2-80 14 1-6-21 Intro Check Were you given any info? no
2 Jane Doe 2-2-80 14 1-6-21 Care Check By using this service.. no
2 Jane Doe 2-2-80 14 1-6-21 Out Check How satisfied are you? unsat
CodePudding user response:
Here is a dplyr
solution:
library(dplyr)
df %>%
group_by(UserID, label) %>%
mutate(newcol = row_number(),
label = if(sum(newcol)> 1) paste0(label,newcol) else label) %>%
ungroup() %>%
select(-newcol)
Or more straight as suggested by r2evans (many thanks!):
library(dplyr)
df %>%
group_by(UserID, label) %>%
mutate(label=if (n() > 1) paste0(label,row_number()) else label)
UserID Full.Name DOB EncounterID Date Type label responses
<int> <chr> <chr> <int> <chr> <chr> <chr> <chr>
1 1 John Smith 1-1-90 13 1-1-21 Intro Check Were you given any info?1 (null)
2 1 John Smith 1-1-90 13 1-2-21 Intro Check Were you given any info?2 no
3 1 John Smith 1-1-90 13 1-3-21 Intro Check Were you given any info?3 yes
4 2 Jane Doe 2-2-80 14 1-6-21 Intro Check Were you given any info? no
5 2 Jane Doe 2-2-80 14 1-6-21 Care Check By using this service.. no
6 2 Jane Doe 2-2-80 14 1-6-21 Out Check How satisfied are you? unsat
data:
df <- structure(list(UserID = c(1L, 1L, 1L, 2L, 2L, 2L), Full.Name = c("John Smith",
"John Smith", "John Smith", "Jane Doe", "Jane Doe", "Jane Doe"
), DOB = c("1-1-90", "1-1-90", "1-1-90", "2-2-80", "2-2-80",
"2-2-80"), EncounterID = c(13L, 13L, 13L, 14L, 14L, 14L), Date = c("1-1-21",
"1-2-21", "1-3-21", "1-6-21", "1-6-21", "1-6-21"), Type = c("Intro",
"Intro", "Intro", "Intro", "Care", "Out"), label = c("Check Were you given any info?",
"Check Were you given any info?", "Check Were you given any info?",
"Check Were you given any info?", "Check By using this service..",
"Check How satisfied are you?"), responses = c("(null)", "no",
"yes", "no", "no", "unsat")), class = "data.frame", row.names = c(NA,
-6L))
CodePudding user response:
Try this:
ave(dat$label, dat[c("UserID", "label")],
FUN = function(z) if (length(z) > 1) seq_along(z) else "")
# [1] "1" "2" "3" "" "" ""
which can be used as
dat$label <- paste0(dat$label,
ave(dat$label, dat[c("UserID", "label")],
FUN = function(z) if (length(z) > 1) seq_along(z) else "")
)
# UserID Full.Name DOB EncounterID Date Type label responses
# 1 1 John Smith 1-1-90 13 1-1-21 Intro Check Were you given any info?1 (null)
# 2 1 John Smith 1-1-90 13 1-2-21 Intro Check Were you given any info?2 no
# 3 1 John Smith 1-1-90 13 1-3-21 Intro Check Were you given any info?3 yes
# 4 2 Jane Doe 2-2-80 14 1-6-21 Intro Check Were you given any info? no
# 5 2 Jane Doe 2-2-80 14 1-6-21 Care Check By using this service.. no
# 6 2 Jane Doe 2-2-80 14 1-6-21 Out Check How satisfied are you? unsat
Data
dat <- structure(list(UserID = c(1, 1, 1, 2, 2, 2), Full.Name = c("John Smith", "John Smith", "John Smith", "Jane Doe", "Jane Doe", "Jane Doe"), DOB = c("1-1-90", "1-1-90", "1-1-90", "2-2-80", "2-2-80", "2-2-80"), EncounterID = c(13, 13, 13, 14, 14, 14), Date = c("1-1-21", "1-2-21", "1-3-21", "1-6-21", "1-6-21", "1-6-21"), Type = c("Intro", "Intro", "Intro", "Intro", "Care", "Out"), label = c("Check Were you given any info?", "Check Were you given any info?", "Check Were you given any info?", "Check Were you given any info?", "Check By using this service..", "Check How satisfied are you?"), responses = c("(null)", "no", "yes", "no", "no", "unsat")), row.names = c(NA, -6L), class = "data.frame")