I am using R to create an API request for data from a website and, as you can see from below, there is a lot of repetition in my code. This is created by the limitations of the site's API.
I would like to create a loop where the content of the text iterates through the years in the first text string and automatically creates Df1 through Df5. Then, passing this string through command1, then command2 but without those repetitions, too.
Hopefully, the question is clear and you can help
Thanks :)
Df1 <- "search \\\“yyy\\\” where year in [2021] and in [\"xxxxxx\"] return zzz"
Df2 <- "search \\\“yyy\\\” where year in [2020] and in [\"xxxxxx\"] return zzz"
Df3 <- "search \\\“yyy\\\” where year in [2019] and in [\"xxxxxx\"] return zzz"
Df4 <- "search \\\“yyy\\\” where year in [2018] and in [\"xxxxxx\"] return zzz"
Df5 <- "search \\\“yyy\\\” where year in [2017] and in [\"xxxxxx\"] return zzz"
Df1 <- command1(query = Df1, token = token)
Df2 <- command1(query = Df2, token = token)
Df3 <- command1(query = Df3, token = token)
Df4 <- command1(query = Df4, token = token)
Df5 <- command1(query = Df5, token = token)
Final_Df1 <- command2(Df1, dbsource = "APISource", format = "api")
Final_Df2 <- command2(Df2, dbsource = "APISource", format = "api")
Final_Df3 <- command2(Df3, dbsource = "APISource", format = "api")
Final_Df4 <- command2(Df4, dbsource = "APISource", format = "api")
Final_Df5 <- command2(Df5, dbsource = "APISource", format = "api")
Data_Frame <- rbind(Final_Df1, Final_Df2, Final_Df3, Final_Df4, Final_Df5)
CodePudding user response:
This problem looks like as if metaprogramming is needed but actually it doesn't, if all you need is the final data frame for the rest of the program. I would do sth like:
do_query <- function(y, year, x, z, token, dbsource="APISource", format="api") {
s <- paste0("search \\\"", y, "\\\" where year in [", year, "] and in [\"", x, "\"] return ", z)
df <- command1(query=s, token=token)
command2(df, dbsource = dbsource, format = format)
}
do_queries <- function(params_list) {
# the params list is a list of the parameters as named lists
dfs <- lapply(params_list, function(params) do.call(do_query, params)) # this returns a list of data frames
rbind(dfs) # this merges them to one single data frame # eventually you have to correct row_names
}
# use it like this:
# generate the params_list:
params_list <- lapply(rev(2017:2021), function(year) list(y="yyy",
year=year,
x="xxxxx",
z="zzz"))
# and then call do_queries over it
df <- do_queries(params_list)
Okay, probably, do.call
counts as metaprogramming.
If you in such a function want an assignment at one evaluation level over that of the function done, theoretically, you could do sth like:
assign_strings <- function(years, y="yyy", x="xxxxx", z="zzz") {
n <- length(years)
strings <- sapply(years, function(year) paste0("search \\\"", y, "\\\" where year in [", year, "] and in [\"", x, "\"] return ", z))
for (i in 1:n) {
assign(paste0("Df", i), strings[i])
}
}
# Then,
assign_strings(rev(2017:2021))
# will do exactly what you want for the first.
But this code is not very readable. Especially, Variables "Df1", "Df2", ... are generated, yet you can't nowhere see them in the code defined
in the form Df1 <- ...
, Df2 <- ...
- so you are lost when searching where they were initiated.
Therefore, it is better so have all such variables in ONE list
together, and give the list names, so that you can call them like:
dfs["Df1"]
, dfs["Df2"]
or Df[1]
, Df[2]
...
And you know exactly when and where you created the dfs
or Df
object/list.