Conditionally summing values in column in R-CodePudding

I have a large dataset that includes a breakdown of revenue and expenditure for each county in the US. There is a column (Type) that includes codes for the type of revenue or expenditure. I need to sum the total dollar amount (Flow) for each county for the total revenue and total expenditure. I have been trying to do with the following code and keep getting this error: Error in FUN(X[[I]], ...) : only defined on a data frame with all numeric-alike variables. Is there a way around this error or another way of doing what I want to do? Thanks in advance for your help!

Data linked here.

county2013$Expenditures <- county2013 %>%
  group_by(FIPS) %>%
  sum(county2013[which(county2013$Type == 'B01' | county2013$Type == 'B21' | county2013$Type == 'B01' | county2013$Type == 'B21' | county2013$Type == 'B22' | county2013$Type == 'B30' | county2013$Type == 'B42' | county2013$Type == 'B46' | county2013$Type == 'B50' | county2013$Type == 'B59' | county2013$Type == 'B79' | county2013$Type == 'B80' | county2013$Type == 'B89' | county2013$Type == 'B91' | county2013$Type == 'B92' | county2013$Type == 'B93' | county2013$Type == 'B94' | county2013$Type == 'C21' | county2013$Type == 'C30' | county2013$Type == 'C42' | county2013$Type == 'C46' | county2013$Type == 'C50' | county2013$Type == 'C79' | county2013$Type == 'C80' | county2013$Type == 'C89' | county2013$Type == 'C91' | county2013$Type == 'C92' | county2013$Type == 'C93' | county2013$Type == 'C94' | county2013$Type == 'D21' | county2013$Type == 'D30' | county2013$Type == 'D42' | county2013$Type == 'D46' | county2013$Type == 'D50' | county2013$Type == 'D79' | county2013$Type == 'D80' | county2013$Type == 'D89' | county2013$Type == 'D91' | county2013$Type == 'D92' | county2013$Type == 'D93' | county2013$Type == 'D94' | county2013$Type == 'T01'| county2013$Type =='T09' | county2013$Type =='T10' | county2013$Type == 'T11' | county2013$Type == 'T12'  | county2013$Type == 'T13' | county2013$Type == 'T14' | county2013$Type =='T15' | county2013$Type == 'T16' | county2013$Type == 'T19'| county2013$Type == 'T20' | county2013$Type == 'T21' | county2013$Type == 'T22' | county2013$Type == 'T23' | county2013$Type == 'T24' | county2013$Type == 'T25' | county2013$Type == 'T27' | county2013$Type == 'T28' | county2013$Type == 'T29' | county2013$Type == 'T40' | county2013$Type == 'T41' | county2013$Type == 'T50' | county2013$Type == 'T51' | county2013$Type == 'T53' | county2013$Type == 'T99' | county2013$Type == 'A01' | county2013$Type == 'A03' | county2013$Type == 'A09' | county2013$Type == 'A10' | county2013$Type == 'A12' | county2013$Type == 'A16'| county2013$Type == 'A18'| county2013$Type == 'A21' | county2013$Type == 'A36' | county2013$Type == 'A44' | county2013$Type == 'A45' | county2013$Type == 'A50' | county2013$Type == 'A56' | county2013$Type == 'A59'| county2013$Type == 'A60'| county2013$Type == 'A61'| county2013$Type == 'A80'| county2013$Type == 'A81'| county2013$Type == 'A87'| county2013$Type == 'A89' | county2013$Type == 'U01' | county2013$Type == 'U11' | county2013$Type == 'U20' | county2013$Type == 'U21' | county2013$Type == 'U30' | county2013$Type == 'U40' | county2013$Type == 'U41' | county2013$Type == 'U50' | county2013$Type == 'U95' | county2013$Type == 'U99' | county2013$Type == 'A90' | county2013$Type == 'A91' | county2013$Type == 'A92' | county2013$Type == 'A93' | county2013$Type == 'A94' | county2013$Type == 'X01' | county2013$Type == 'X02' | county2013$Type == 'X05' | county2013$Type == 'X08' | county2013$Type == 'Y01' | county2013$Type == 'Y02' | county2013$Type == 'Y04' | county2013$Type == 'Y11' | county2013$Type == 'Y12' | county2013$Type == 'Y51' | county2013$Type == 'Y52'), 5])

CodePudding user response：

The error most likely occurs because your result frame contains characters and numbers. The next solution works on the limited info available on an artificial toy-example:

data
  [1] "B01" "B21" "B22" "B30" "B42" "B46" "B50" "B59" "B79" "B80" "B89" "B91"
 [13] "B92" "B93" "B94" "C21" "C30" "C42" "C46" "C50" "C79" "C80" "C89" "C91"
 [25] "C92" "C93" "C94" "D21" "D30" "D42" "D46" "D50" "D79" "D80" "D89" "D91"
 [37] "D92" "D93" "D94" "T01" "T09" "T10" "T11" "T12" "T13" "T14" "T15" "T16"
 [49] "T19" "T20" "T21" "T22" "T23" "T24" "T25" "T27" "T28" "T29" "T40" "T41"
 [61] "T50" "T51" "T53" "T99" "A01" "A03" "A09" "A10" "A12" "A16" "A18" "A21"
 [73] "A36" "A44" "A45" "A50" "A56" "A59" "A60" "A61" "A80" "A81" "A87" "A89"
 [85] "U01" "U11" "U20" "U21" "U30" "U40" "U41" "U50" "U95" "U99" "A90" "A91"
 [97] "A92" "A93" "A94" "X01" "X02" "X05" "X08" "Y01" "Y02" "Y04" "Y11" "Y12"
[109] "Y51" "Y52"

sum(county2013$Flow[county2013$Type %in% data])

CodePudding user response：

I hope I understand what you are trying to accomplish. There are 323 unique "Type", and some of them are expenditures and some are revenues. You want to group by either revenue or expendidure.

I do not know if the alternatives you list are exependiture or revenues, but for the example let's say it is expenditure.

Rather than writing a long "county2013$Type == 'B01' | county2013$Type == 'B21' etc, it is easier to put the values of Type that corresponds to expenditures and put them into a vector. To save you some typing, I used stringr to modify your code and accomplish this.

library(tidyverse)
text <- c("county2013$Type == 'B01' | county2013$Type == 'B21' | county2013$Type == 'B01' | county2013$Type == 'B21' | county2013$Type == 'B22' | county2013$Type == 'B30' | county2013$Type == 'B42' | county2013$Type == 'B46' | county2013$Type == 'B50' | county2013$Type == 'B59' | county2013$Type == 'B79' | county2013$Type == 'B80' | county2013$Type == 'B89' | county2013$Type == 'B91' | county2013$Type == 'B92' | county2013$Type == 'B93' | county2013$Type == 'B94' | county2013$Type == 'C21' | county2013$Type == 'C30' | county2013$Type == 'C42' | county2013$Type == 'C46' | county2013$Type == 'C50' | county2013$Type == 'C79' | county2013$Type == 'C80' | county2013$Type == 'C89' | county2013$Type == 'C91' | county2013$Type == 'C92' | county2013$Type == 'C93' | county2013$Type == 'C94' | county2013$Type == 'D21' | county2013$Type == 'D30' | county2013$Type == 'D42' | county2013$Type == 'D46' | county2013$Type == 'D50' | county2013$Type == 'D79' | county2013$Type == 'D80' | county2013$Type == 'D89' | county2013$Type == 'D91' | county2013$Type == 'D92' | county2013$Type == 'D93' | county2013$Type == 'D94' | county2013$Type == 'T01'| county2013$Type =='T09' | county2013$Type =='T10' | county2013$Type == 'T11' | county2013$Type == 'T12'  | county2013$Type == 'T13' | county2013$Type == 'T14' | county2013$Type =='T15' | county2013$Type == 'T16' | county2013$Type == 'T19'| county2013$Type == 'T20' | county2013$Type == 'T21' | county2013$Type == 'T22' | county2013$Type == 'T23' | county2013$Type == 'T24' | county2013$Type == 'T25' | county2013$Type == 'T27' | county2013$Type == 'T28' | county2013$Type == 'T29' | county2013$Type == 'T40' | county2013$Type == 'T41' | county2013$Type == 'T50' | county2013$Type == 'T51' | county2013$Type == 'T53' | county2013$Type == 'T99' | county2013$Type == 'A01' | county2013$Type == 'A03' | county2013$Type == 'A09' | county2013$Type == 'A10' | county2013$Type == 'A12' | county2013$Type == 'A16'| county2013$Type == 'A18'| county2013$Type == 'A21' | county2013$Type == 'A36' | county2013$Type == 'A44' | county2013$Type == 'A45' | county2013$Type == 'A50' | county2013$Type == 'A56' | county2013$Type == 'A59'| county2013$Type == 'A60'| county2013$Type == 'A61'| county2013$Type == 'A80'| county2013$Type == 'A81'| county2013$Type == 'A87'| county2013$Type == 'A89' | county2013$Type == 'U01' | county2013$Type == 'U11' | county2013$Type == 'U20' | county2013$Type == 'U21' | county2013$Type == 'U30' | county2013$Type == 'U40' | county2013$Type == 'U41' | county2013$Type == 'U50' | county2013$Type == 'U95' | county2013$Type == 'U99' | county2013$Type == 'A90' | county2013$Type == 'A91' | county2013$Type == 'A92' | county2013$Type == 'A93' | county2013$Type == 'A94' | county2013$Type == 'X01' | county2013$Type == 'X02' | county2013$Type == 'X05' | county2013$Type == 'X08' | county2013$Type == 'Y01' | county2013$Type == 'Y02' | county2013$Type == 'Y04' | county2013$Type == 'Y11' | county2013$Type == 'Y12' | county2013$Type == 'Y51' | county2013$Type == 'Y52'")


ex <- text %>% str_extract_all("== '(.{3})") %>% 
unlist() %>%
str_replace_all("== '", "")

We now have a vector called ex which contains all the codes in type that represent expenditures.

The second step is to add a column to the X2013 dataframe that states whether the row correspont to expendidure or revenue. To do this we use mutate, then ifelse, and the %in% operator which filters the values according to whether they are present in the "ex" vector.

The third step is to do a group_by using county and the newly created column exp_rev, then summarize the flow.

Hope this correspond to what you try to achieve :

X2013 %>% mutate(exp_rev = ifelse(Type %in% ex, "Expendidure", "Revenue")) %>% 
  group_by(County, exp_rev) %>%
  summarize(Flow_sum = sum(Flow))

# A tibble: 510 x 3
# Groups:   County [255]
   County exp_rev       Flow_sum
   <chr>  <chr>            <dbl>
 1 000    Expendidure 1940831046
 2 000    Revenue     9093188172
 3 001    Expendidure   49803211
 4 001    Revenue      239163156
 5 002    Expendidure   41984941
 6 002    Revenue      212603913
 7 003    Expendidure   22009389
 8 003    Revenue      104761407