creating a variable with an existing function [i.e., max()] within a user defined R function-CodePudding

I'm sorry if this is a repeated question, but I can't find any answers that I can understand.

I'm trying to make a simple user defined function (my first attempt) to more easily explore a much larger dataset. The data has this basic form.

ID <- c("A","A","A","A","A","B","B","B","B","B")
x <- seq(1:10)
y <- sample(1:100, 10)
z <- sample(1:100, 10)
data <- data.frame(x,y,z)

Here is a simple function that works, but only partly. In short, I'd like to be able to specify the ID within the function, and make a 2 variable scatterplot for the data associated with that ID:

summary_plot("A")

summary_plot <- function(ID_code) {
plot(data$x[data$ID==ID_code],data$y[data$ID==ID_code], xlim=c(0,10))
points(data$x[data$ID==ID_code],data$z[data$ID==ID_code], col="red")

As you can see, I'm plotting variable y first and then adding the points for variable z. The problem is that sometimes the range of data for variable y on the y-axis exceed the bounds of the first plot (not a problem for x as this has fixed bounds).

My crude solution was to find the maximum of both values using the max() function, and set the ylim of the first plot to that value, m.

summary_plot <- function(ID_code) {

max_y <- max(data$y[data$ID==ID_code]) ## max of y
max_z <- max(data$z[data$ID==ID_code]) ## max of z
maximum <- c(as.numeric(max_y),as.numeric(max_z)) # both in a vector
m <- (as.numeric(max(maximum))) #find the higher max

plot(data$x[data$ID==ID_code],data$y[data$ID==ID_code], xlim=c(0,10))
points(data$x[data$ID==ID_code],data$z[data$ID==ID_code], col="red", ylim=c(0,m)
}

All of this code works fine in the the console, but does nothing when I execute the function. The plot is created, but with non-specified y values. When I check, the value of m I get 'Error: object 'm' not found'

I would appreciate if someone can explain the fundamentals of why this does not work, and also propose a workaround.

Thanks in advance!

Sean

CodePudding user response：

I think you just had a few typos in your code, and nothing is fundamentally wrong with the logic of your code. It looks like you forgot to close the parenthesis of points in your function and you forgot to include the ID variable in your dataframe. Here is the revised code. See if this is what you were after:

set.seed(45)

ID <- c("A","A","A","A","A","B","B","B","B","B")
x <- seq(1:10)
y <- sample(1:100, 10)
z <- sample(1:100, 10)
data <- data.frame(x,y,z,ID)

summary_plot<-function(ID_code){
  max_y <- max(data$y[data$ID==ID_code]) ## max of y
  max_z <- max(data$z[data$ID==ID_code]) ## max of z
  maximum <- c(as.numeric(max_y),as.numeric(max_z)) # both in a vector
  m <- as.numeric(max(maximum)) #find the higher max
  print(m)
  plot(data$x[data$ID==ID_code],data$y[data$ID==ID_code], xlim=c(0,10))
  points(data$x[data$ID==ID_code],data$z[data$ID==ID_code], col="red", ylim=c(0,m))
}
  
summary_plot("A")

It looks like this

CodePudding user response：

Here is a simpler version of the function. To subset on ID_code just once right at the beginning makes everything simpler and more readable.
And to set the ylim when the plot opens will make the y axis start and end at the given values, to set them after has no effect.

summary_plot<-function(ID_code){
  d <- data[data$ID == ID_code, ]
  max_y <- max(d$y)           # max of y
  max_z <- max(d$z)           # max of z
  maximum <- c(max_y, max_z)  # both in a vector
  m <- max(maximum)           # find the higher max
  plot(d$x, d$y, xlim = c(0, 10), ylim = c(0, m))
  points(d$x, d$z, col = "red")
  m                           # return m
}

summary_plot("A")