Home > database >  Obtaining Percentage for Date Observations
Obtaining Percentage for Date Observations

Time:12-02

I am very new to R and am struggling with this concept. I have a data frame that looks like this: enter image description here

I have used summary(FoodFacilityInspections$DateRecent) to get the observations for each "date" listed. I have 3932 observations, though, and wanted to get a summary of:

  • Dates with the most observations and the percentage for that
  • Percentage of observations for the Date Recent category

I have tried: *

> count(FoodFacilityInspections$DateRecent) Error in UseMethod("count")
> :    no applicable method for 'count' applied to an object of class
> "factor"

CodePudding user response:

Using built in data as you did not provide example data

library(data.table)
dtcars <- data.table(mtcars, keep.rownames = TRUE)

Solution

dtcars[, .("count"=.N, "percent"=.N/dtcars[, .N]*100), 
       by=cyl]

CodePudding user response:

You can use the table function to find out which date occurs the most. Then you can loop through each item in the table (date in your case) and divide it by the total number of rows like this (also using the mtcars dataset):

table(mtcars$cyl)

percent <- c()
for (i in 1:length(table(mtcars$cyl))){
    percent[i] <- table(mtcars$cyl)[i]/nrow(mtcars) * 100
}
output <- cbind(table(mtcars$cyl), percent)
output

     percent
4 11  34.375
6  7  21.875
8 14  43.750

CodePudding user response:

A one-liner using table and proportions in within.

within(as.data.frame.table(with(mtcars, table(cyl))), Pc <- proportions(Freq)*100)
#   cyl Freq     Pc
# 1   4   11 34.375
# 2   6    7 21.875
# 3   8   14 43.750

CodePudding user response:

An updated solution with total, percent and cumulative percent table based on your data.

library(data.table)

data<-data.frame("ScoreRecent"=c(100,100,100,100,100,100,100,100,100),
                 "DateRecent"=c("7/23/2021", "7/8/2021","5/25/2021","5/19/2021","5/20/2021","5/13/2021","5/17/2021","5/18/2021","5/18/2021"),
                 "Facility_Type_Description"=c("Retail Food Stores", "Retail Food Stores","Food Service Establishment","Food Service Establishment","Food Service Establishment","Food Service Establishment","Food Service Establishment","Food Service Establishment","Food Service Establishment"),
                 "Premise_zip"=c(40207,40207,40207,40206,40207,40206,40207,40206,40206),
                 "Opening_Date"=c("6/27/1988","6/29/1988","10/20/2009","2/28/1989","10/20/2009","10/20/2009","10/20/2009","10/20/2009", "10/20/2009"))


tab <- function(dataset, var){
  
  dataset %>%
    group_by({{var}}) %>% 
    summarise(n=n()) %>%
    mutate(total = cumsum(n),
           percent = n / sum(n) * 100,
           cumulativepercent = cumsum(n / sum(n) * 100))
  
}

tab(data, Facility_Type_Description)

 Facility_Type_Description      n total percent cumulativepercent
  <chr>                      <int> <int>   <dbl>             <dbl>
1 Food Service Establishment     7     7    77.8              77.8
2 Retail Food Stores             2     9    22.2             100  
  • Related