I have written a couple of functions that translate a name (e.g. "Harrington") into its equivalent phone number ("4277464866"). One function get_number()
takes a single character and returns its corresponding number ("h" to "4"), while translate_name_to_number()
takes a data frame column and returns the phone number equivalent ("Harrington" to "4277464866"). When I put a print()
inside the translate_name_to_number()
function, it prints out a bunch of varied, correct strings. Specifically, this function uses a couple of for loops to iterate over the rows, then each character within the row:
translate_name_to_number <- function(the_column) {
for (the_string in the_column) { # iterate once per row
name_vector <- strsplit(the_string, "")[[1]] # split into a vector to iterate over it
built_phone_number <- "" # initialize a string to hold the built number
for (a_letter in name_vector) { # for each letter in the name vector...
built_phone_number <- str_c(built_phone_number, get_number(a_letter)) # ...add its number
}
print(built_phone_number) # print the concatenated result
# return(built_phone_number) #...but this only returns one value
}
}
When the print statement is active, I get a good result, like:
[1] "7424273766"
[1] "4277464866"
[1] "6668466379"
[1] "9455426766"
[1] "8455277325"
[1] "7424273766"
[1] "7366464866"
[1] "4277464866"
This is great output, and is what I want to place in a new column. So I take my data frame and try to apply the translate_name_to_number()
function using the pipe and dplyr::mutate()
. However, when I do, I get a new column whose rows only contain the value for the final name, not a unique value for each row.
I'm clearly missing a concept. Can someone illustrate what the problem is?
CodePudding user response:
This is because your function is not vectorized.
if you have a dataframe df
with column name
, you can use rowwise()
:
df%>%
rowwise()%>%
mutate(phone_number=translate_name_to_number(name))%>%
ungroup()
also as @r2evans pointed out in the comment, the function should be rewritten as:
#for dplyr
translate_name_to_number <- function(the_column) {
for (the_string in the_column) { # iterate once per row
name_vector <- strsplit(the_string, "")[[1]] # split into a vector to iterate over it
built_phone_number <- "" # initialize a string to hold the built number
for (a_letter in name_vector) { # for each letter in the name vector...
built_phone_number <- str_c(built_phone_number, get_number(a_letter)) # ...add its number
}
#print(built_phone_number) # print the concatenated result
return(built_phone_number) #...but this only returns one value
}
}
CodePudding user response:
Return is only going to return the last value. You can't return from the function multiple times in a loop.
I would suggest adding your values to a list and returning the list.