Home > database >  How to 'summarize' variable which mixed by 'numeric' and 'character'
How to 'summarize' variable which mixed by 'numeric' and 'character'

Time:12-15

here is data.frame data as below , how to transfer it to wished_data Thanks!

library(tidyverse)
data <- data.frame(category=c('a','b','a','b','a'),
                      values=c(1,'A','2','4','B'))

#below code can't work
data %>% group_by(category ) %>% 
  summarize(sum=if_else(is.numeric(values)>0,sum(is.numeric(values)),paste0(values)))
  
#below is the wished result
wished_data <- data.frame(category=c('a','a','b','b'),
           values=c('3','B','A','4'))

CodePudding user response:

I'd create a separate column to group numeric values in a category separately from characters.

data %>%
  mutate(num_check = grepl("[0-9]", values)) %>%
  group_by(category, num_check) %>%
  summarize(sum = ifelse(
    unique(num_check),
    as.character(sum(as.numeric(values))),
    unique(values)
  ), .groups = "drop")
#> # A tibble: 4 × 3
#>   category num_check sum  
#>   <chr>    <lgl>     <chr>
#> 1 a        FALSE     B    
#> 2 a        TRUE      3    
#> 3 b        FALSE     A    
#> 4 b        TRUE      4

CodePudding user response:

Mixing numeric and character variables in a column is not tidy. Consider giving each type their own column, for example:

data %>%
  mutate(letters = str_extract(values, "[A-Z]"),
         numbers = as.numeric(str_extract(values, "\\d"))) %>%
  group_by(category) %>%
  summarise(values = sum(numbers, na.rm = T),
            letters = na.omit(letters))

  category values letters
  <chr>     <dbl> <chr>  
1 a             3 B      
2 b             4 A  

In R string math does not make sense, "1 1" is not "2", and is.numeric("1") gives FALSE. A workaround is converting to list object, or to give each their own columns.

CodePudding user response:

Here is a bit of a messy answer,

library(dplyr)

bind_rows(data %>% 
            filter(is.na(as.numeric(values))), 
          data %>% 
            mutate(values = as.numeric(values)) %>% 
            group_by(category) %>% 
            summarise(values = as.character(sum(values, na.rm = TRUE)))) %>% 
         arrange(category)

   category values
#1        a      B
#2        a      3
#3        b      A
#4        b      4
  • Related