I have this data in a spreadsheet
Country Sales
Spain 1 1000
Spain 2 200
France 300
Nigeria 1 500
Nigeria 2 700
I want the sum of this country's sales stored a seperate dataframe.
I tried using the dplyr function but the result is not what i want
Here is the output I want please
Country Sum_of_sales
Spain 1200
France 300
Nigeria 1200
Is there a way I can run this on R that will give me this above output stored in a separate dataframe please.
CodePudding user response:
Strip the trailing numbers (and spaces) off the country labels, then do a normal grouped sum:
library(dplyr)
df %>%
mutate(Country= gsub(pattern = " *[0-9] ", replacement = "", x = Country)) %>%
group_by(Country) %>%
summarize(Sum_of_Sales = sum(Sales))
# # A tibble: 3 × 2
# Country Sum_of_Sales
# <chr> <int>
# 1 France 300
# 2 Nigeria 1200
# 3 Spain 1200
Using this sample input:
df = read.table(text = "Country Sales
'Spain 1' 1000
'Spain 2' 200
'France' 300
'Nigeria 1' 500
'Nigeria 2' 700", header = T)
CodePudding user response:
If you wanna keep it tidyverse
, you can use extract
:
library(tidyr)
library(dplyr)
df %>%
extract(Country, "Country") %>%
group_by(Country) %>%
summarise(Sum_of_sales = sum(Sales))
# A tibble: 3 × 2
Country Sum_of_sales
<chr> <int>
1 France 300
2 Nigeria 1200
3 Spain 1200
For more complex cases, you can use:
extract(Country, "Country", "([A-Za-z -]*)\\s*[0-9]*")
CodePudding user response:
An option with str_remove
library(dplyr)
library(stringr)
df %>%
group_by(Country = str_remove(Country, "\\s \\d ")) %>%
summarise(Sum_of_sales = sum(Sales))
-output
# A tibble: 3 × 2
Country Sum_of_sales
<chr> <int>
1 France 300
2 Nigeria 1200
3 Spain 1200
CodePudding user response:
library(dplyr)
df <- data.frame(
Country = c("Spain","Spain","France","Nigeria","Nigeria"),
Sales = c(1000,200,300,500,700)
)
df_by_country <-
df %>%
group_by(Country) %>%
summarise(Sales = sum(Sales,na.rm = TRUE))
df_by_country
# A tibble: 3 x 2
Country Sales
<chr> <dbl>
1 France 300
2 Nigeria 1200
3 Spain 1200