Sorry if this has been asked before, couldn't find the forum cause I'm not even sure how to look this up. But this is my problem, I have this dataframe in R:
Area Item Year Unit Value
<chr> <chr> <chr> <chr> <chr>
1 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2000-2002 % 87
2 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2001-2003 % 88
3 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2002-2004 % 91
4 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2003-2005 % 92
5 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2004-2006 % 92
6 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2005-2007 % 94
7 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2006-2008 % 95
8 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2007-2009 % 97
9 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2008-2010 % 100
10 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2009-2011 % 102
There are 51 items in the data frame, but some areas and some years don't have certain items. I would like an outcome as follows to be able to play with Correlation matrixes, heatmaps, Data Viz, etc., but I don't know how:
Area Year Item1 Item2 ... Item52
Afghanistan 2000-2002 87 NA ... NA
Afghanistan 2001-2002 NA* 88 ... NA
*It could happen that there is an Item1 value for 2001-2002 in Afghanistan, but for the example I didn't put it.
Where Item-i is the name of the 51 different items and the dataframe is filled with NAs where the value for that Item in that area and year isn't meassured.
Thanks!
CodePudding user response:
Given your explanation i assume the data is sorted i.e. 51 elements with possible missings as NA in order.
df<-data.frame(Area=c(rep("Afghanistan", 51*12),
rep("Pakistan", 51*12)),
Item=paste("Average dieatary item", rep(1:51, each=12)),
Year = rep(paste(2000:2011, 2002:2013, sep="-"), 51),
Value = c(87,88,91,92,92,94,95,97,100,102,200,300, sample(100, 51*2*12-12,T)))
result <-do.call(rbind, by(df, list(df$Year, df$Area), function(x) {
data <- data.frame(Area = unique(x$Area), Year = unique(x$Year), t(x$Value))
colnames(data)[3:53] = paste("Item",1:51)
data
}))
print(head(result[c(1,2,3:5,50:51)]))
#> Area Year Item 1 Item 2 Item 3 Item 48 Item 49
#> 1 Afghanistan 2000-2002 87 91 50 52 10
#> 2 Afghanistan 2001-2003 88 20 91 46 67
#> 3 Afghanistan 2002-2004 91 30 15 88 83
#> 4 Afghanistan 2003-2005 92 74 21 29 17
#> 5 Afghanistan 2004-2006 92 87 65 71 66
#> 6 Afghanistan 2005-2007 94 58 41 46 49
other way with dplyr/tidyr
library(dplyr)
library(tidyr)
result2 <- df %>% group_by(Year, Area) %>% mutate(id= 1:n()) %>%
select(Area, id, Year, Value) %>%
pivot_wider(c(Area, Year), names_from = id, names_prefix = "Item", values_from = Value)
print(head(result2[c(1,2,3:5,50:51)]))
#> # A tibble: 6 × 7
#> # Groups: Year, Area [6]
#> Area Year Item1 Item2 Item3 Item48 Item49
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Afghanistan 2000-2002 87 91 50 52 10
#> 2 Afghanistan 2001-2003 88 20 91 46 67
#> 3 Afghanistan 2002-2004 91 30 15 88 83
#> 4 Afghanistan 2003-2005 92 74 21 29 17
#> 5 Afghanistan 2004-2006 92 87 65 71 66
#> 6 Afghanistan 2005-2007 94 58 41 46 49
Created on 2022-11-07 with reprex v2.0.2