Home > Enterprise >  Visualizing temporary results in R using table and summary with the name of the variables
Visualizing temporary results in R using table and summary with the name of the variables

Time:05-24

I am analyzing several variables that I'm creating from a big database. They are mostly dummies or categorical, and they usually come in PAIRS, and they are part of a much larger data frame.

For variable, I want to print clean calculations about it:

  • Two tables: each one with the frequency of each value (which includes NA even when it's 0);
  • A summary with the mean of both

Something like this:

Var01:
    0     1  <NA> 
50395 40292     0 

Var02:
    0     1  <NA> 
13757 76930     0 

Means:
  Var01  Var02
1 68.39% 96.39%

I just need to see these results once, not to save them.

The names of the variables are actually complicated (for instance: dm_idade_0a17_pre), and I didn't want to copy and paste them too many times as I was doing before.

I tried to do it creating temporary variables plus the functions table() and summary(). I used a custom function to see the means as percentage (called it percent()). The problem is just that the table function isn't showing me the NAME of the variable.

So, my coding is something like this:

###########

# CUSTOM FUNCTION

percent <- function(x, digits = 3, format = "f", ...) {
  paste0(formatC(x * 100, format = format, digits = digits, ...), "%")
}

# ORIGINAL DATA FRAME

df <- data.frame(
  ch_name = letters[1:5],
  ch_key = c(1:5))

# 1st new variable = 
df$ab_cd <- sample(0:1,5,replace = TRUE)

# 2nd new variable = 
df$ab_cd_e <- sample(0:1,5,replace = TRUE)


# CREATING TEMPORARY VARIABLES

{
  x1 <- df$ab_cd
  x2 <- df$ab_cd_e
  
  y1 <- table(x1, useNA = 'always')
  y2 <- table(x2, useNA = 'always')
  
  z1 <- data.frame(
    "ab_cd" = percent(mean(x1)),
    "ab_cd_e" = percent(mean(x2)))

#  PRINTING THEM
  
  cat("\014")
  print(y1)
  print(y2)
  z1
}
###########

The result I would get is this:

x1
   0    1 <NA> 
   2    3    0 
x2
   0    1 <NA> 
   3    2    0 
    ab_cd ab_cd_e
1   60.00% 40.00%

If the names of the variables x1 and x2 were the original names of the columns I used, my problem would be solved (it's ugly, but better than nothing).

Thank you all for your attention!

(Please: This might look a lazy thing, but bear in mind that I still need to do this over 80 times. Each time, the names of the variables aren't clean enough: they are similar, which makes CTRL F or double-clicking too slow. Hope you all understand!)

CodePudding user response:

You could do something like this:

f <- function(s1,s2) {
  cat(s1)
  print(table(df[[s1]],useNA='always',deparse.level=0))
  cat(s1)
  print(table(df[[s1]],useNA='always',deparse.level=0))
  setNames(
    data.frame(percent(mean(df[[s1]], na.rm=T)),percent(mean(df[[s2]], na.rm=T))),
    c(s1,s2)
  )
}

Usage:

f("ab_cd", "ab_cd_e")

Output:

ab_cd
   0    1 <NA> 
   1    4    0 
ab_cd
   0    1 <NA> 
   1    4    0 
    ab_cd ab_cd_e
1 80.000% 40.000%
  • Related