I have a dataframe where each row represents a person, and the columns represent the names they have gone by. Some of the values are NAs or are duplicates. The data looks like the dataframe immediately below.
Name | Name1 | Name2 | Name3 | Name4 |
---|---|---|---|---|
Tom | Tom | Thomas | Tom | Tommy |
Jim | NA | James | NA | Jimmy |
Dave | Dave | David | NA | Davey |
Tim | NA | Timothy | Tim | Timmy |
Rob | Rob | NA | Rob | Robby |
Sam | NA | NA | Sam | NA |
I would like to combine the unique names from each row and put them into a new column, where each name only occurs once. I know i can use the paste function to produce a column where all the text values appear like this:
Name | Name1 | Name2 | Name3 | Name4 | unique |
---|---|---|---|---|---|
Tom | Tom | Thomas | NA | Tommy | Tom, Tom, Thomas, NA, Tommy |
but I don't want the same text to appear multiple times in the unique column. How can I combine the row data such that each name only occurs once in the new $unique cell?
Name | Name1 | Name2 | Name3 | Name4 | unique |
---|---|---|---|---|---|
Tom | Tom | Thomas | Tom | Tommy | Tom, Thomas, Tommy |
Jim | NA | James | NA | Jimmy | Jim, James, Jimmy |
Dave | Dave | David | NA | Davey | Dave, David, Davey |
Tim | NA | Timothy | Tim | Timmy | Tim, Timothy, Timmy |
Rob | Rob | NA | Rob | Robby | Rob, Robert, Robby |
Sam | NA | NA | Sam | NA | Sam |
Any help would be greatly appreciated.
CodePudding user response:
Using tidyverse
library(dplyr)
df1 %>%
rowwise %>%
mutate(unique = toString(unique(na.omit(c_across(everything()))))) %>%
ungroup
-output
# A tibble: 6 × 6
Name Name1 Name2 Name3 Name4 unique
<chr> <chr> <chr> <chr> <chr> <chr>
1 Tom Tom Thomas Tom Tommy Tom, Thomas, Tommy
2 Jim <NA> James <NA> Jimmy Jim, James, Jimmy
3 Dave Dave David <NA> Davey Dave, David, Davey
4 Tim <NA> Timothy Tim Timmy Tim, Timothy, Timmy
5 Rob Rob <NA> Rob Robby Rob, Robby
6 Sam <NA> <NA> Sam <NA> Sam
CodePudding user response:
apply
ing unique
row-wise, na.omit
and collapse toString
.
transform(dat, unique=apply(dat, 1, \(x) toString(na.omit(unique(x)))))
# Name Name1 Name2 Name3 Name4 unique
# 1 Tom Tom Thomas Tom Tommy Tom, Thomas, Tommy
# 2 Jim <NA> James <NA> Jimmy Jim, James, Jimmy
# 3 Dave Dave David <NA> Davey Dave, David, Davey
# 4 Tim <NA> Timothy Tim Timmy Tim, Timothy, Timmy
# 5 Rob Rob <NA> Rob Robby Rob, Robby
# 6 Sam <NA> <NA> Sam <NA> Sam
You could additionally implement a sort
if you like.
Data:
dat <- structure(list(Name = c("Tom", "Jim", "Dave", "Tim", "Rob", "Sam"
), Name1 = c("Tom", NA, "Dave", NA, "Rob", NA), Name2 = c("Thomas",
"James", "David", "Timothy", NA, NA), Name3 = c("Tom", NA, NA,
"Tim", "Rob", "Sam"), Name4 = c("Tommy", "Jimmy", "Davey", "Timmy",
"Robby", NA)), class = "data.frame", row.names = c(NA, -6L))
CodePudding user response:
df%>%
rowid_to_column()%>%
left_join(pivot_longer(.,-rowid)%>%
group_by(rowid)%>%
summarise(value=toString(na.omit(unique(value))), .groups = 'drop'))
rowid Name Name1 Name2 Name3 Name4 value
1 1 Tom Tom Thomas Tom Tommy Tom, Thomas, Tommy
2 2 Jim <NA> James <NA> Jimmy Jim, James, Jimmy
3 3 Dave Dave David <NA> Davey Dave, David, Davey
4 4 Tim <NA> Timothy Tim Timmy Tim, Timothy, Timmy
5 5 Rob Rob <NA> Rob Robby Rob, Robby
6 6 Sam <NA> <NA> Sam <NA> Sam