Trying to categorize and condense data in R with a key word-CodePudding

df <- read.csv("https://query.data.world/s/gzjmftivszsy44ukfak2e7ksig35jm", header=TRUE, stringsAsFactors=FALSE);
library(ggplot2)
library(qqplotr)
library(stats)
library(dplyr)



coverage_by_Geography = data.frame(avgcancerdiag= df$avgAnnCount, county = df$Geography, PubCoverage = df$PctPublicCoverage, privcoverage = df$PctPrivateCoverage, deathrt = df$avgDeathsPerYear)
ggplot(data = coverage_by_Geography, aes(x = privcoverage, y = deathrt)) geom_col()
ggplot(data = coverage_by_Geography, aes(x = PubCoverage, y = deathrt)) geom_col()

I am trying to take a bunch of county's within a column, condense them into states and average their data out to state numbers instead of county. Am stumped on how to do it.

CodePudding user response：

A general tidyverse solution follows:

library(tidyverse)

df <- read_csv("https://query.data.world/s/gzjmftivszsy44ukfak2e7ksig35jm")

df %>%
  separate(Geography, c("county", "state"), ", ") %>% 
  select(state, county, everything()) %>% 
  group_by(state) %>% 
  summarize(across(-c(county), mean))

The code separates county and states into two columns. Grouping by state allows you to summarize the data. Here, I asked for the mean of all of the columns, but this probably doesn't make sense for all of the different data types. Hopefully this gets you closer to what you are looking for.