Home > Software engineering >  Make a sentence to summarize a simple data frame
Make a sentence to summarize a simple data frame

Time:05-24

I have a simple df as below, and I would like to create a summary notation for it. what will be the most effective way to build it? Could anyone guide me on this?

enter image description here

The summary that I would like to build is: There are 2 students in ELA: G8-01, G9-08; There are 2 students in MATH: G8-09, G9-06; There is 1 student in ART: G9-04.

structure(list(ID = c("G8-01", "G8-09", "G9-08", "G9-04", "G9-05", 
"G9-06", "G9-07"), ELA = c("G8-01", NA, "G9-08", NA, NA, NA, 
NA), MATH = c(NA, "G8-09", NA, NA, NA, "G9-06", NA), PE = c(NA, 
NA, NA, NA, NA, NA, NA), ART = c(NA, NA, NA, "G9-04", NA, NA, 
NA)), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"))

CodePudding user response:

A tidyverse solution using stringr::str_glue_data() to format a string:

library(tidyverse)

df %>%
  pivot_longer(-1, values_drop_na = TRUE) %>%
  group_by(name) %>%
  summarise(n = n(), id = toString(value)) %>%
  str_glue_data("There {ifelse(n>1, 'are', 'is')} {n} student{ifelse(n>1, 's', '')} in {name}: {id};")

which returns

# There is 1 student in ART: G9-04;
# There are 2 students in ELA: G8-01, G9-08;
# There are 2 students in MATH: G8-09, G9-06;

CodePudding user response:

You would typically do this with cat. You probably want to map the columns and their names together, and for tidiness put it inside a little function:

report <- function(data) {
  Map(function(x, nm) {
    cat('There are ', sum(!is.na(x)), " students in ", nm, ": ",
        paste(x[!is.na(x)], collapse = ', '), '\n', sep = '')
  }, x = data[-1], nm = names(data)[-1])
  invisible(NULL)
}

This results in:

report(df)
#> There are 2 students in ELA: G8-01, G9-08
#> There are 2 students in MATH: G8-09, G9-06
#> There are 0 students in PE:
#> There are 1 students in ART: G9-04

CodePudding user response:

You can use pluralize() from the package cli.

library(cli)
library(dplyr)
library(purrr)

df %>% 
  select(-ID) %>% 
  map(discard, is.na) %>% 
  compact() %>% 
  iwalk(~ cat(pluralize("There {qty(length(.x))}{?is/are} {length(.x)} student{?s} in {qty(.y)}{.y}: {qty(.x)}{.x}"), sep = "\n"))

Which gives the following:

There are 2 students in ELA: G8-01 and G9-08
There are 2 students in MATH: G8-09 and G9-06
There is 1 student in ART: G9-04

You can tweak this to return the text if you want to use it elsewhere. I used cat() in this example to just print it to the console.

For example to save the text:

txt <- df %>% 
  select(-ID) %>% 
  map(discard, is.na) %>% 
  compact() %>% 
  imap_chr(~ pluralize("There {qty(length(.x))}{?is/are} {length(.x)} student{?s} in {qty(.y)}{.y}: {qty(.x)}{.x}"))

unname(txt)
# [1] "There are 2 students in ELA: G8-01 and G9-08" 
# [2] "There are 2 students in MATH: G8-09 and G9-06"
# [3] "There is 1 student in ART: G9-04"             

CodePudding user response:

The previous answer is already good if you want a report for every subject. If you just want to get automatically the line as you said you can use:

First, create a summary with all the subjects, number students and the codes:

example = example %>% 
  pivot_longer(cols=c(-ID),names_to='Subject',values_to='Code') %>% 
  filter(! is.na(Code)) %>% 
  group_by(Subject) %>% 
  summarise(n_students = n(),
            Codes = paste0(Code, collapse=', '))

Put everything together:

lapply(example, 
           function(i) paste0(paste("There are",example$n_students,"students in",example$Subject,":",example$Codes),
                              collapse='; '))[[1]]

Output:

[1] "There are 1 students in ART : G9-04; There are 2 students in ELA : G8-01, G9-08; There are 2 students in MATH : G8-09, G9-06"

Maybe lapply is not the most elegant way, but, it works. Also, you can apply as.factor to Subjects and create the levels to sort the sentence as you want.

  • Related