Home > Software engineering >  Different Cramer's V result on same dataframe when using Shiny package
Different Cramer's V result on same dataframe when using Shiny package

Time:09-20

I am hoping someone can help me.

I am performing Cramer's V tests on categorical data in R. Here's an example of the code:

#cramer's v
df1 <- subset(ACCIDENT_MASTER_single,  select = c("SEVERITY", "ATMOSPH_COND"))

# Converting into numeric matrix
df3 <- data.matrix(df1)

#calculate Cramer's V
cramerV(df3)

I am using Shiny so that a user can select the categorical variables via dropdown menus and then the result of the Cramer's V is displayed. My code works, but interestingly, the results I am getting are completely different, even though I am using the same dataframe. Can anyone tell me why?

Here is an example of the R code using the Shiny package:

library(shinydashboard)
library(shiny)
library(dplyr)
library(DT)
library(rcompanion) 

df <- data.frame(ACCIDENT_MASTER_single)
    
    Cat1.Variables <- c("SEVERITY", "ATMOSPH_COND", "DAY_OF_WEEK")
    Cat2.Variables <- c("SEVERITY", "ATMOSPH_COND", "DAY_OF_WEEK")
    
    
    ui <- fluidPage(
      titlePanel("Calculate the strength of the relationship between categorical variables"),
      sidebarLayout(
        sidebarPanel(
          selectInput("cat1", choices = Cat1.Variables, label = "Select a Categorical Variable:"),
          selectInput("cat2", choices = Cat2.Variables, label = "Select a Categorical Variable:")
        ),
        mainPanel(
          tableOutput("results")
        )
      )
    )
    
    server <- shinyServer(function(input, output) {
      cramerdata <- reactive({
        req(input$cat1, input$cat2)
        
        df %>%
          {
            table(.[[input$cat1]], .[[input$cat2]])
          }
      })
      
      
      output$results <- renderPrint({
        cat(sprintf("\nThe results equal: \n"))
        
        print(cramerV(cramerdata()))
      })
    })
    
    
    shinyApp(ui, server)

Also, I have tested this on a number of different variables and all of my results are different, not just for the two variables in this example. Would love some help please!

EDIT: someone suggested I use dput(head(ACCIDENT_MASTER_single)) so a snippet of my results of that are found below (the dataset is very large). I hope this helps!

 > dput(head(ACCIDENT_MASTER_single))
structure(list(ACCIDENT_NO = c("T20150000004", "T20150000017", 
"T20150000020", "T20150000028", "T20150000034", "T20150000052"
), ACCIDENTDATE = c("2015-01-01", "2015-01-01", "2015-01-01", 
"2015-01-01", "2015-01-01", "2015-01-01"), ACCIDENTTIME = c("02:10:00", 
"07:20:00", "06:51:00", "07:55:00", "17:10:00", "01:20:00"), 
    ACCIDENT_TYPE = c(2L, 1L, 4L, 1L, 4L, 1L), DAY_OF_WEEK = c(5L, 
    5L, 5L, 4L, 5L, 5L), DCA_CODE = c(108L, 130L, 173L, 135L, 
    171L, 121L), DIRECTORY = c("MEL", "MEL", "MEL", "MEL", "MEL", 
    "MEL"), LIGHT_CONDITION = c(3L, 1L, 2L, 1L, 1L, 3L), ROAD_GEOMETRY = c(5L, 
    4L, 1L, 5L, 5L, 1L), SEVERITY = c(3L, 2L, 1L, 3L, 3L, 2L), 
    SPEED_ZONE = c(60L, 70L, 70L, 100L, 60L, 60L), ROAD_TYPE = c("ROAD", 
    "ROAD", "ROAD", "ROAD", "ROAD", "DRIVE"), ATMOSPH_COND = c("1", 
    "1", "1", "1", "1", "1"), ATMOSPH_COND_SEQ = c("1", "1", 
    "1", "0", "1", "1"), LGA_NAME = c("MOONEE VALLEY", "MONASH", 
    "BAYSIDE", "BRIMBANK", "MELTON", "BRIMBANK"), DEG_URBAN_NAME = c("MELB_URBAN", 
    "MELB_URBAN", "MELB_URBAN", "MELB_URBAN", "MELB_URBAN", "MELB_URBAN"
    ), Lat = c(-37.77922923, -37.88240078, -37.92909811, -37.76758102, 
    -37.72427767, -37.76316596), Long = c(144.9309415, 145.0903658, 
    145.0028103, 144.8002374, 144.7529804, 144.7897546), POSTCODE_NO = c(3032L, 
    3148L, 3186L, 3022L, 3023L, 3023L), Surface.Cond.Desc = c("Dry", 
    "Dry", "Dry", "Dry", "Dry", "Dry"), SURFACE_COND = c("1", 
    "1", "1", "1", "1", "1"), SURFACE_COND_SEQ = c("1", "1", 
    "1", "0", "1", "1"), ROAD_SURFACE_TYPE = c("1", "1,1", "1", 
    "1,1", "1", "1,1"), VEHICLE_TYPE = c("99", "5,2", "1", "1,62", 
    "1", "1,1"), TRAFFIC_CONTROL = c("0", "1,1", "0", "0,0", 
    "0", "1,1"), EVENT_TYPE = c("C", "C", "3,C", "C,3,C,3,C", 
    "3,C", "C"), SEX = c("M,U", "M,M", "M", "F,U", "M", "M,M,M,F"
    ), AGE = c("32,NA", "56,43", "28", "54,NA", "23", "17,16,19,41"
    ), Age.Group = c("30-39,unknown", "50-59,40-49", "26-29", 
    "50-59,unknown", "22-25", "16-17,16-17,17-21,40-49"), INJ_LEVEL = c("3,4", 
    "2,3", "1", "3,4", "3", "2,4,4,3"), ROAD_USER_TYPE = c("1,9", 
    "2,2", "2", "2,2", "2", "3,3,2,2")), row.names = c(NA, 6L
), class = "data.frame")

Thanks

CodePudding user response:

The result is working for me... Try setting the seed also: set.seed(1)

  cramerdata <- reactive({
    req(input$cat1, input$cat2)
    df3 <- data.matrix(ACCIDENT_MASTER_single[c(input$cat1, input$cat2)])
    df3
  })
  
  output$results <- renderPrint({
    cat(sprintf("\nThe results equal: \n"))
    print(cramerV(cramerdata()))
  })
  • Related