Home > other >  is it possible in R to use the input dataframe name in a selfmade function that creates dataframes?
is it possible in R to use the input dataframe name in a selfmade function that creates dataframes?

Time:10-28

I have a working function which is around 250 lines, a simplified version:

myfunction <- function(x){
  
  WithoutNA <<- x[!(is.na(x$Height)),]
  Heavy <- WithoutNA[WithoutNA$Weight >= "150",]
  Light <- WithoutNA[WithoutNA$Weight < "150",]

  HL <<- Heavy[Heavy$FurColor=="light_Brown",]
  HD <<- Heavy[Heavy$FurColor=="Dark_Brown",]

  LL <<- Light[Leavy$FurColor=="light_Brown",]
  LD <<- Light[Leavy$FurColor=="Dark_Brown",]
}

So this function will give 4 different dataframes excluding rows where no Height is present, separated by weight and fur color the problem I encounter is that if I use this function on two different dataframes the second time it will of course override the 4 dataframes it created the first time the function was used.

if I type in:

myfunction(Horse)
myfunction(Pony)

I would like 8 dataframes called: HL_Horse, HD_Horse, LL_Horse, LD_Horse, HL_Pony, HD_Pony, LL_Pony and LD_Pony

But I can't seem to figure out how to get the Dataframe name into my newly produced dataframes names. Is it even possible to make a 'variable' dataframe name?

CodePudding user response:

This entire concept is flawed. R is a (largely) functional programming language, and users don't expect side effects, particularly (over)writing objects in the calling environment. A far better idea is to have your function return a list of data frames.

Lists are better than directly writing to the calling environment for a number of reasons. They avoid cluttering the global workspace, they can be iterated over, their elements can be named or unnamed, they can be nested, they can be converted into environments, and they can act as a container to allow a function to return multiple objects - just as in your example.

The standard R way to use a function like yours would be something like this:

myfunction <- function(x){
  
  WithoutNA <- x[!(is.na(x$Height)),]
  Heavy <- WithoutNA[WithoutNA$Weight >= "150",]
  Light <- WithoutNA[WithoutNA$Weight < "150",]
  
  HL <- Heavy[Heavy$FurColor=="light_Brown",]
  HD <- Heavy[Heavy$FurColor=="Dark_Brown",]
  
  LL <- Light[Light$FurColor=="light_Brown",]
  LD <- Light[Light$FurColor=="Dark_Brown",]
  
  return(list(HL = HL, HD = HD, LL = LL, LD = LD))
}

Now if we give it some toy data:

df <- data.frame(Height = c(2, 2, 2, 2),
                 Weight = c(100, 100, 200, 200),
                 FurColor = rep(c("light_Brown", "Dark_Brown"), 2))

horse <- myfunction(df)
pony <- myfunction(df)

We can access each of the 8 data frames easily by doing, for example:

horse$HL
#>   Height Weight    FurColor
#> 3      2    200 light_Brown

pony$LD
#>   Height Weight   FurColor
#> 2      2    100 Dark_Brown

Note that getting to each data frame involves the same number of characters as each of your named data frames, except you now have all the other benefits of having your data frames safely and logically stored as lists.

If you want to make your global workspace even less cluttered, you can even nest the lists so that all your data frames are in one master list. So for example, you could have:

equines <- list()
equines$horse <- myfunction(df)
equines$pony <- myfunction(df)

And now you have only a single object in your global workspace but you can access each data frame in a consistent and easy to remember way, e.g.

equines$pony$HL
#>   Height Weight    FurColor
#> 3      2    200 light_Brown
  • Related