I have a following dataframe:
df <- structure(list(Number = c("3117", "3118", "3119", "3120", "3121",
"3122"), City = c("Акмолинская", "Актюбинская", "Алматинская",
"Атырауская", "ЗКО", "Жамбылская"), Year = c("2001", "2001",
"2001", "2001", "2001", "2001"), Info = c("Среднегодовая численность населения РК (чел.)",
"Среднегодовая численность населения РК (чел.)", "Среднегодовая численность населения РК (чел.)",
"Среднегодовая численность населения РК (чел.)", "Среднегодовая численность населения РК (чел.)",
"Среднегодовая численность населения РК (чел.)"), Value = c("765690",
"669198", "1554447", "445631", "600987", "980563"), Status = c("Факт",
"Факт", "Факт", "Факт", "Факт", "Факт")), row.names = c(NA, 6L
), class = "data.frame")
I need to sum Value column for each Year to create a sum with "Республика Казахстан" in City column. In other words, I need to create a sum of Value for all cities for each year and name it with a country name in City column. How to do that?
I tried this code, but it gives me "invalid 'type' (character) of argument" error:
for (year in unique(df$Year)) {
df[nrow(df) 1,] = c("0","Республика Казахстан", year, "Среднегодовая численность населения РК (чел.)", sum(df[which(df[,3]==year),5]), "Факт")
}
CodePudding user response:
(Up front, my emacs/ess isn't showing the utf-8 strings so they look empty here. They are not.)
First, to sum the value, it cannot be character
. From there, summarize then join the original data.
base R
df$Value <- as.numeric(df$Value)
newdf <- transform(aggregate(Value ~ Year, data = df, FUN = sum), City = "City Sum")
newdf <- cbind(newdf, df[,setdiff(names(df), names(newdf))][0,][NA,])
rbind(df, newdf[,names(df)])
# Number City Year Info Value Status
# 1 3117 2001 ( .) 765690
# 2 3118 2001 ( .) 669198
# 3 3119 2001 ( .) 1554447
# 4 3120 2001 ( .) 445631
# 5 3121 2001 ( .) 600987
# 6 3122 2001 ( .) 980563
# 7 <NA> City Sum 2001 <NA> 5016516 <NA>
dplyr
library(dplyr)
df <- mutate(df, Value = as.numeric(Value))
df %>%
group_by(Year) %>%
summarize(City = "City Sum", Value = sum(Value)) %>%
bind_rows(df, .)
# Number City Year Info Value Status
# 1 3117 2001 ( .) 765690
# 2 3118 2001 ( .) 669198
# 3 3119 2001 ( .) 1554447
# 4 3120 2001 ( .) 445631
# 5 3121 2001 ( .) 600987
# 6 3122 2001 ( .) 980563
# 7 <NA> City Sum 2001 <NA> 5016516 <NA>