I have one column of a data frame that consists of names for more than one thousands.
Name
Barack Obama
Xijin Ping
Bladimir Putin
John Smith
...
I want to create a new column for ethnicity scores that will be retrieved from API (https://www.name-prism.com/api). I have received API token from the website and an example code for Barack Obama (http://www.name-prism.com/api_token/nat/csv/[API_token]/Barack Obama). The web results show as follows.
European-SouthSlavs,0.0000 Muslim-Pakistanis-Bangladesh,0.0000 European-Italian-Italy,0.0000 European-Baltics,0.0000 African-SouthAfrican,0.0000 European-Italian-Romania,0.0031 Muslim-Nubian,0.0026 European-French,0.1359 EastAsian-Indochina-Thailand,0.0000 EastAsian-Indochina-Vietnam,0.0108 Jewish,0.0000 Muslim-Turkic-CentralAsian,0.0000 EastAsian-Indochina-Cambodia,0.0000 Nordic-Scandinavian-Denmark,0.0000 EastAsian-Indochina-Myanmar,0.0000 Nordic-Finland,0.0000 Muslim-Persian,0.0035 Nordic-Scandinavian-Sweden,0.0000 Muslim-Maghreb,0.0000 Greek,0.0000 Muslim-Pakistanis-Pakistan,0.0000 Hispanic-Portuguese,0.0003 European-Russian,0.0128 Muslim-ArabianPeninsula,0.0000 African-WestAfrican,0.0324 EastAsian-Japan,0.0000 European-German,0.0001 EastAsian-Chinese,0.0005 SouthAsian,0.0060 Hispanic-Spanish,0.0126 Nordic-Scandinavian-Norway,0.0000 Muslim-Turkic-Turkey,0.0000 Hispanic-Philippines,0.0001 CelticEnglish,0.0436 EastAsian-Malay-Malaysia,0.0041 EastAsian-South Korea,0.0000 African-EastAfrican,0.7259 European-EastEuropean,0.0000 EastAsian-Malay-Indonesia,0.0057
Below I managed to get a result from the case of Barack Obama, but, I am not sure for more than one thousand names.
result <- GET("http://www.name-prism.com/api_token/nat/csv/[API_token]/Barack Obama")
The desired outcomes will be as follows. Based on the list of names in the data frame, I want to add the ethnicity score from the web API as a new column,
Name // Ethnicity Score
Barack Obama // 0.781
Xijin Ping // 0.812
Bladimir Putin // 0.912
John Smith // 0.777
...`
Thank you for your help in advance!
CodePudding user response:
With the specification you give this is what i can do.
Suppose you have a data.frame called df_names
and the column df_names$name
has the names you are to search the ethnicities. This will create a new df df_result
with the names and the top ethnicities and values (for the cases that were found).
However i think the API may have some limit on the number of requests you can do.
I hope this can help you.
df_result <- purrr::map_dfr(df_names$name, function(name) {
result <- GET(paste0("http://www.name-prism.com/api_token/nat/csv/",
api_token,"/", URLencode(name)))
if(http_error(result)){
NULL
}else{
eth<- content(result, "text")
eth<- do.call(rbind, strsplit(strsplit(eth, split = "(?<=\\d) ", perl=T)[[1]],","))
#first three ethincities
top_eth <- eth[order(as.numeric(eth[,2]), decreasing = T)[1:3],]
c(name,as.vector(t(top_eth)))
}
})