I need to give the same name (values) to a non-numeric characters in a column composed by universities. An example of my table listed below. Of course, there is many other names of universities, people and columns. I just need to change this part of the data frame.
Name | Affiliation |
---|---|
Jose Ramayana | OXFORD UNIVERSITY |
Andres Andresius | OFORD UNIVERSITY |
Pepito Perez | UNIVERSIDAD NACIONAL |
Cacolo Osorio | Universidad Nacional de Bogotá |
Maleja Patras | Unievrsidad del Valle |
Tigre Tony | Universidad Nacional |
Pocho Valencia | UNIVERSIDAD DEL VALLE |
Puti Gutierrez | OXFORD UNIVERSITY |
Chuchi Lopez | UPTC |
Ganso Salazar | Uptc |
Santiago Andrade | PONTIFICIA UNIVERSIDAD JAVERIANA |
Andrés Tigreros | JAVERIANA CALI |
I was trying to use this code but I justo got many replications of the same person at least 10 times.
DB_CO1<- DB_CO %>%
mutate(FinalAssociation = map(affiliation, ~DB_CO$affiliation[str_detect(.x,DB_CO$affiliation)])) %>%
unnest (cols = c(FinalAssociation))
Desired result: that all the values in affiliation stay as the same of some way
Name | Affiliation |
---|---|
Jose Ramayana | OXFORD UNIVERSITY |
Andres Andresius | OXFORD UNIVERSITY |
Pepito Perez | UNIVERSIDAD NACIONAL DE BOGOTÁ |
Cacolo Osorio | UNIVERSIDAD NACIONAL DE BOGOTÁ |
Maleja Patras | UNIVERSIDAD DEL VALLE |
Tigre Tony | UNIVERSIDAD NACIONAL DE BOGOTÁ |
Pocho Valencia | UNIVERSIDAD DEL VALLE |
Puti Gutierrez | OXFORD UNIVERSITY |
Chuchi Lopez | UPTC |
Ganso Salazar | UPTC |
Santiago Andrade | PONTIFICIA UNIVERSIDAD JAVERIANA CALI |
Andrés Tigreros | PONTIFICIA UNIVERSIDAD JAVERIANA CALI |
Thanks a lot in advance for your help.
CodePudding user response:
This agrep
solution relies on several assumptions.
- A fuzzy match between the items is possible (i.e. no heavily abbreviated names like UN etc)
- The longer string is the desired name.
- No ambiguities occurs.
dat_n <- sapply( dat$Affiliation, function(x)
dat$Affiliation[agrep(x,dat$Affiliation,ignore.case = TRUE)] )
dat$Affiliation_new <- toupper( unlist(sapply( dat_n, function(x)
x[which.max( nchar(x) )] )) )
Name Affiliation
1 Jose Ramayana OXFORD UNIVERSITY
2 Andres Andresius OFORD UNIVERSITY
3 Pepito Perez UNIVERSIDAD NACIONAL
4 Cacolo Osorio Universidad Nacional de Bogotá
5 Maleja Patras Unievrsidad del Valle
6 Tigre Tony Universidad Nacional
7 Pocho Valencia UNIVERSIDAD DEL VALLE
8 Puti Gutierrez OXFORD UNIVERSITY
9 Chuchi Lopez UPTC
10 Ganso Salazar Uptc
11 Santiago Andrade PONTIFICIA UNIVERSIDAD JAVERIANA
12 Andrés Tigreros JAVERIANA CALI
Affiliation_new
1 OXFORD UNIVERSITY
2 OXFORD UNIVERSITY
3 UNIVERSIDAD NACIONAL DE BOGOTÁ
4 UNIVERSIDAD NACIONAL DE BOGOTÁ
5 UNIEVRSIDAD DEL VALLE
6 UNIVERSIDAD NACIONAL DE BOGOTÁ
7 UNIEVRSIDAD DEL VALLE
8 OXFORD UNIVERSITY
9 UPTC
10 UPTC
11 PONTIFICIA UNIVERSIDAD JAVERIANA
12 JAVERIANA CALI
Data
dat <- structure(list(Name = c("Jose Ramayana", "Andres Andresius",
"Pepito Perez", "Cacolo Osorio", "Maleja Patras", "Tigre Tony",
"Pocho Valencia", "Puti Gutierrez", "Chuchi Lopez", "Ganso Salazar",
"Santiago Andrade", "Andrés Tigreros"), Affiliation = c("OXFORD UNIVERSITY",
"OFORD UNIVERSITY", "UNIVERSIDAD NACIONAL", "Universidad Nacional de Bogotá",
"Unievrsidad del Valle", "Universidad Nacional", "UNIVERSIDAD DEL VALLE",
"OXFORD UNIVERSITY", "UPTC", "Uptc", "PONTIFICIA UNIVERSIDAD JAVERIANA",
"JAVERIANA CALI")), class = "data.frame", row.names = c(NA, -12L
))