Home > Blockchain >  Function to represent variables based on their tyoe
Function to represent variables based on their tyoe

Time:08-21

I have a data frame with continuous and categorical variables. The numerical variables are either integer or double while categorical are factors. I want to create a function that depending on the type of variable, shows a table of frequencies (for factors) or a histogram (for continuous). A very simple example database similar to the real one:

data <- c(0,1,0,23,24,25,1,1,0)
dataset <- matrix(data, nrow=3, ncol=3, dimnames = list(c("patient#1", "patient#2", "patient#3"),c("gender", "age", "smoking")))
dataset <- data.frame(dataset)

dataset$gender <- as.factor(dataset$gender)
dataset$smoking <- as.factor(dataset$smoking)

I create a vector assigning 0 if the variable is continuous and 1 if it's categorical:

i <- c(1,3)
cat <- NULL
cat[i] <- 1
cat[is.na(cat)] <- 0
cat <- as.factor(cat)

Now I put the name of the variables in a new object and create my function:

dataset_n <- names(dataset)

f_display <- function(a,b,c){
              for (i in 1:length(b)){
                print(i)    # I included this just to check that the loop was performing well
                print(a[i])
                print(b[i])
                print(c[i])
                if (c[i] == 1){
                  print(table(a$b[i]))
                } else {
                  histogram(a$b[i])
                }
              }
            }

f_display(dataset,dataset_n,cat)

The problem is that the outputs or the function are NULL, so tables are empty and when trying to use histogram, I get an error saying that histogram can not be applied to an object of class NULL:

Here the console output:

[1] 1
          gender
patient#1      0
patient#2      1
patient#3      0
[1] "gender"
[1] 1
Levels: 0 1
< table of extent 0 >
[1] 2
          age
patient#1  23
patient#2  24
patient#3  25
[1] "age"
[1] 0
Levels: 0 1
Error in UseMethod("histogram") : 
  no applicable method for 'histogram' applied to an object of class "NULL"

I tried the function because nothing else worked to call the column using a variable. With the function, it seems that at least dataset columns are called correctly but no idea why the values (rows) are not there.

I'd appreciate any help (either telling me what is wrong in the code of my function or giving me any other way to display the values of the columns in different ways depending on if they are continuous or categorical).

Thank you

CodePudding user response:

I think you're making life more complicated for yourself than it needs to be. This gets you what you want:

f_display <- function(col) {
  if (is.factor(dataset[[col]])) {
    table(dataset[[col]])
  } else {
    hist(dataset[[col]], xlab=col, main=paste0("Histogram of ", col))
  }
}

lapply(names(dataset), f_display)

To tidy things up a little, you can add names to the value returned by lapply:

answer <- lapply(names(dataset), f_display)
names(answer) <- names(dataset)
answer
$gender

0 1 
2 1 

$age
$breaks
[1] 23.0 23.5 24.0 24.5 25.0

$counts
[1] 1 1 0 1

$density
[1] 0.6666667 0.6666667 0.0000000 0.6666667

$mids
[1] 23.25 23.75 24.25 24.75

$xname
[1] "dataset[[col]]"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"

$smoking

0 1 
1 2 

And the histogram looks like this: Histogram of age

To generalise the function to work on any data frame, you can do this:

f_display1 <- function(col, df) {
  if (is.factor(df[[col]])) {
    table(df[[col]])
  } else {
    hist(df[[col]], xlab=col, main=paste0("Histogram of ", col))
  }
}

lapply(names(dataset), f_display1, df=dataset)
  • Related