Home > Mobile >  R tidyverse split strings by commas and calculate mean
R tidyverse split strings by commas and calculate mean

Time:12-05

I have this df:

library(tidyverse)
library(magrittr)

df <- tibble(
  Time = c('June 7', 'June 8', 'June 9', 'June 10', 'June 11', 'June 12', 'June 13', 'June 14', 'June 15', 'June 16', 'June 17', 'June 18', 'June 19', 'June 20', 'June 21', 'June 22', 'June 23', 'June 24', 'June 25', 'June 26', 'June 27', 'June 28'),
  Measurements = c('105, 54, 79, 49, 31, 84, 55', '50, 105, 85, 72, 27, 43', '58, 26, 38', '67, 52, 92, 46', '73, 59, 62', '57, 24', '78, 96, 107', '76, 49, 40, 34, 44, 55', '18, 60, 39', '39, 55, 35', '86, 27, 91, 49, 23, 65, 32, 74', '32, 47, 57', '70, 56', '146, 39', '94, 39, 21, 72, 55', '48, 70, 10, 160', '126, 87, 107, 45, 55, 39', '33, 62, 38', '43, 63, 68, 21, 126, 87, 107', '56, 86, 64', '66, 55', '34, 44, 55, 72, 51, 42')
)

I want to split the values in Measurements by commas and calculate the mean for each row (rowwise)

I was able to split and convert to numeric:

df %>% lapply(str_split(.$Measurements, ', '), as.numeric)

But didn't know how to proceed from here. Any help is appreicated!

Instead of lapply, can I use purrr::map here instead?

CodePudding user response:

This is a possibile approach:

df %>% 
  mutate(Time = factor(Time, levels = Time)) %>%
  separate(Measurements, sep = ",", into = letters[seq(1, 10)]) %>% 
  pivot_longer(a:j) %>% 
  na.omit() %>% 
  mutate(value = as.numeric(value)) %>% 
  group_by(Time) %>% 
  summarise(mean = mean(value))
# A tibble: 22 × 2
   Time     mean
   <fct>   <dbl>
 1 June 7   65.3
 2 June 8   63.7
 3 June 9   40.7
 4 June 10  64.2
 5 June 11  64.7
 6 June 12  40.5
 7 June 13  93.7
 8 June 14  49.7
 9 June 15  39  
10 June 16  43  
# … with 12 more rows

CodePudding user response:

A hacky solution...

df <- str_split(df$Measurements, ', ')
means <- NULL
row <- NULL
for (i in seq_along(df)){
  row <- as.numeric(str_split(df[[i]], ', '))
  means[i] <- mean(row)
}

CodePudding user response:

I think you are looking for something like this:

library(tidyverse)
library(stringr)

# to pass data to lapply your way needs '{}'
# use unnamed function \(x) = shorthand for function(x)
df$Measurements <- df %>% 
  {lapply(str_split(.$Measurements, ', '), \(x) x %>% 
            as.numeric() %>% 
            mean())} %>% 
  do.call(rbind, .)

# A tibble: 22 × 2
   Time    Measurements[,1]
   <chr>              <dbl>
 1 June 7              65.3
 2 June 8              63.7
 3 June 9              40.7
 4 June 10             64.2
 5 June 11             64.7
 6 June 12             40.5
 7 June 13             93.7
 8 June 14             49.7
 9 June 15             39  
10 June 16             43  
# … with 12 more rows
  • Related