I'm trying to summarise values within my dataset that contain the word Paris
, for example:
Data looks like this:
# A tibble: 21 x 2
lc_names GHI
<chr> <dbl>
1 Oloron-Sainte-Marie 1252.
2 Orléans 1189.
3 Palaiseau 1162.
4 Pamiers 1388.
5 Paris, 11e arrondissement 1149.
6 Paris, 12e arrondissement 1148.
7 Paris, 12e arronissement 1148.
8 Paris, 13e arrondissement 1151.
9 Paris, 14e arrondissement 1154.
10 Paris, 15e arrondissement 1154.
# … with 11 more rows
Example code:
agg_france %>% group_by(lc_names) %>% summarise(across(contains('Paris'), mean))
However, this does not produce the output expected.
Expected output:
1 Oloron-Sainte-Marie 1252
2 Orléans 1189
3 Palaiseau 1162
4 Pamiers 1388
5 Paris 1149 1149
Reproducible code:
structure(list(lc_names = c("Oloron-Sainte-Marie", "Orléans",
"Palaiseau", "Pamiers", "Paris, 11e arrondissement", "Paris, 12e arrondissement",
"Paris, 12e arronissement", "Paris, 13e arrondissement", "Paris, 14e arrondissement",
"Paris, 15e arrondissement", "Paris, 16e arrondissement", "Paris, 17e arrondissement",
"Paris, 18e arrondissement", "Paris, 19e arrondissement", "Paris, 1er arrondissement",
"Paris, 20e arrondissement", "Paris, 2e arrondissement", "Paris, 3e arrondissement",
"Paris, 4e arrondissement", "Paris, 5e arrondissement", "Paris, 6e arrondissement"
), GHI = c(1251.96996613576, 1189.28457903792, 1162.44717515565,
1387.88195190278, 1148.93280509349, 1147.88753688082, 1147.68375976562,
1151.08216545596, 1153.95916748047, 1153.74241428749, 1151.25635477972,
1146.90766517376, 1145.5153276716, 1144.86586292338, 1149.09527587891,
1146.63027231317, 1148.66392074093, 1148.57146857767, 1148.67533126532,
1149.40849720348, 1151.19995944783)), row.names = c(NA, -21L), class = c("tbl_df",
"tbl", "data.frame"))
CodePudding user response:
Here's one way to accomplish this. The idea is to first str_match all strings containing Paris
, and then create a new location variable that is either Paris or something else.
library(tidyverse)
library(stringr)
data %>% mutate(loc = str_match(lc_names, 'Paris')) %>%
mutate(loc = ifelse(is.na(loc), lc_names, loc)) %>%
mutate(loc = as.vector(loc)) %>% # str_match returns a matrix
group_by(loc) %>%
summarise(avg_GHI = mean(GHI))
Output:
# A tibble: 5 x 2
loc avg_GHI
<chr> <dbl>
1 Oloron-Sainte-Marie 1252.
2 Orléans 1189.
3 Palaiseau 1162.
4 Pamiers 1388.
5 Paris 1149.