Home > Mobile >  R - Creating multiple variables with loops
R - Creating multiple variables with loops

Time:12-21

I am using R to create an API request for data from a website and, as you can see from below, there is a lot of repetition in my code. This is created by the limitations of the site's API.

I would like to create a loop where the content of the text iterates through the years in the first text string and automatically creates Df1 through Df5. Then, passing this string through command1, then command2 but without those repetitions, too.

Hopefully, the question is clear and you can help

Thanks :)

Df1 <- "search \\\“yyy\\\” where year in [2021] and in [\"xxxxxx\"] return zzz"
Df2 <- "search \\\“yyy\\\” where year in [2020] and in [\"xxxxxx\"] return zzz"
Df3 <- "search \\\“yyy\\\” where year in [2019] and in [\"xxxxxx\"] return zzz"
Df4 <- "search \\\“yyy\\\” where year in [2018] and in [\"xxxxxx\"] return zzz"
Df5 <- "search \\\“yyy\\\” where year in [2017] and in [\"xxxxxx\"] return zzz"

Df1 <- command1(query = Df1, token = token)
Df2 <- command1(query = Df2, token = token)
Df3 <- command1(query = Df3, token = token)
Df4 <- command1(query = Df4, token = token)
Df5 <- command1(query = Df5, token = token)

Final_Df1 <- command2(Df1, dbsource = "APISource", format = "api")
Final_Df2 <- command2(Df2, dbsource = "APISource", format = "api")
Final_Df3 <- command2(Df3, dbsource = "APISource", format = "api")
Final_Df4 <- command2(Df4, dbsource = "APISource", format = "api")
Final_Df5 <- command2(Df5, dbsource = "APISource", format = "api")

Data_Frame <- rbind(Final_Df1, Final_Df2, Final_Df3, Final_Df4, Final_Df5)

CodePudding user response:

This problem looks like as if metaprogramming is needed but actually it doesn't, if all you need is the final data frame for the rest of the program. I would do sth like:

do_query <- function(y, year, x, z, token, dbsource="APISource", format="api") {
  s <- paste0("search \\\"", y, "\\\" where year in [", year, "] and in [\"", x, "\"] return ", z)
  df <- command1(query=s, token=token)
  command2(df, dbsource = dbsource, format = format)
}

do_queries <- function(params_list) {
  # the params list is a list of the parameters as named lists
  dfs <- lapply(params_list, function(params) do.call(do_query, params)) # this returns a list of data frames
  rbind(dfs) # this merges them to one single data frame # eventually you have to correct row_names
}


# use it like this:
# generate the params_list:
params_list <- lapply(rev(2017:2021), function(year) list(y="yyy", 
                                                          year=year,
                                                          x="xxxxx",
                                                          z="zzz"))
# and then call do_queries over it
df <- do_queries(params_list)

Okay, probably, do.call counts as metaprogramming.

If you in such a function want an assignment at one evaluation level over that of the function done, theoretically, you could do sth like:


assign_strings <- function(years, y="yyy", x="xxxxx", z="zzz") {
  n <- length(years)
  strings <- sapply(years, function(year) paste0("search \\\"", y, "\\\" where year in [", year, "] and in [\"", x, "\"] return ", z))
  for (i in 1:n) {
    assign(paste0("Df", i), strings[i])
  }
}

# Then, 
assign_strings(rev(2017:2021))
# will do exactly what you want for the first.

But this code is not very readable. Especially, Variables "Df1", "Df2", ... are generated, yet you can't nowhere see them in the code defined in the form Df1 <- ..., Df2 <- ... - so you are lost when searching where they were initiated. Therefore, it is better so have all such variables in ONE list together, and give the list names, so that you can call them like: dfs["Df1"], dfs["Df2"] or Df[1], Df[2] ... And you know exactly when and where you created the dfs or Df object/list.

  • Related