I am working with the R programming language. Suppose I have the following data:
Data_I_Have <- data.frame(
"Person" = c("John", "John", "John", "Peter", "Peter", "Peter", "Tim", "Kevin", "Adam", "Adam", "Xavier"),
"Number_of_Kids" = c("4", "1", "1", "5", "2", "3", "7", "0", "3", "3", "5")
)
Person Number_of_Kids
1 John 4
2 John 1
3 John 1
4 Peter 5
5 Peter 2
6 Peter 3
7 Tim 7
8 Kevin 0
9 Adam 3
10 Adam 3
11 Xavier 5
Is it possible to "mark" each duplicate name so that it looks like the file below (e.g. John_1, John_2, etc.)?
Data_I_Want <- data.frame(
"Person" = c("John_1", "John_2", "John_3", "Peter_1", "Peter_2", "Peter_3", "Tim", "Kevin", "Adam_1", "Adam_2", "Xavier"),
"Number_of_Kids" = c("4", "1", "1", "5", "2", "3", "7", "0", "3", "3", "5")
)
Person Number_of_Kids
1 John_1 4
2 John_2 1
3 John_3 1
4 Peter_1 5
5 Peter_2 2
6 Peter_3 3
7 Tim 7
8 Kevin 0
9 Adam_1 3
10 Adam_2 3
11 Xavier 5
Using this previous question Add specific characters to duplicated strings, I tried to follow the method used there:
Data_I_Want <- make.unique(Data_I_Have, sep = '_')
But this gives me the following error:
Error in make.unique(Data_I_Have, sep = "_") :
'names' must be a character vector
Can someone please show me how to fix this?
Thanks!
CodePudding user response:
make.unique
expects a vector and not a data.frame and the output by default will append 1, 2, 3 with .
(as sep
only from the duplicate values and not from the start. i.e.
> make.unique(Data_I_Have$Person)
[1] "John" "John.1" "John.2" "Peter" "Peter.1" "Peter.2" "Tim" "Kevin" "Adam" "Adam.1" "Xavier"
If we want to get the desired output, group by 'Person', and concatenate the row_number()
with the group column and then ungroup()
it.
library(dplyr)
library(stringr)
Data_I_Have %>%
group_by(Person) %>%
mutate(Person = case_when(n() > 1 ~
str_c(Person, "_", row_number()), TRUE ~ Person)) %>%
ungroup()
-output
# A tibble: 11 x 2
Person Number_of_Kids
<chr> <chr>
1 John_1 4
2 John_2 1
3 John_3 1
4 Peter_1 5
5 Peter_2 2
6 Peter_3 3
7 Tim 7
8 Kevin 0
9 Adam_1 3
10 Adam_2 3
11 Xavier 5
CodePudding user response:
Here is an alternative way (update, corrected version):
library(dplyr)
Data_I_Have %>%
group_by(Person) %>%
mutate(id = row_number(),
n = n(),
Person = ifelse(n >1, paste(Person, id, sep="_"), Person)) %>%
select(-id, -n)
Person Number_of_Kids
<chr> <chr>
1 John_1 4
2 John_2 1
3 John_3 1
4 Peter_1 5
5 Peter_2 2
6 Peter_3 3
7 Tim 7
8 Kevin 0
9 Adam_1 3
10 Adam_2 3
11 Xavier 5
CodePudding user response:
A tidyverse option using cumsum()
.
library(dplyr)
Data_I_Have %>%
group_by(Person) %>%
mutate(cnt = 1,
Person = str_c(Person, cumsum(cnt), sep = '_')) %>%
ungroup() %>%
select(-cnt)
# # A tibble: 11 x 2
# Person Number_of_Kids
# <chr> <chr>
# 1 John_1 4
# 2 John_2 1
# 3 John_3 1
# 4 Peter_1 5
# 5 Peter_2 2
# 6 Peter_3 3
# 7 Tim_1 7
# 8 Kevin_1 0
# 9 Adam_1 3
# 10 Adam_2 3
# 11 Xavier_1 5
CodePudding user response:
Using data.table, where .N
is a symbol for the number of rows in the group.
Data_I_Have <- data.frame(
"Person" = c("John", "John", "John", "Peter", "Peter", "Peter", "Tim", "Kevin", "Adam", "Adam", "Xavier"),
"Number_of_Kids" = c("4", "1", "1", "5", "2", "3", "7", "0", "3", "3", "5")
)
library(data.table)
setDT(Data_I_Have)
Data_I_Have[, Person := if (.N == 1) Person else paste0(Person, "_", seq(.N)),
by = Person]
Data_I_Have
#> Person Number_of_Kids
#> 1: John_1 4
#> 2: John_2 1
#> 3: John_3 1
#> 4: Peter_1 5
#> 5: Peter_2 2
#> 6: Peter_3 3
#> 7: Tim 7
#> 8: Kevin 0
#> 9: Adam_1 3
#> 10: Adam_2 3
#> 11: Xavier 5
Created on 2021-09-15 by the reprex package (v2.0.1)