Home > Software design >  Unravel Data into a more friendly DataFrame
Unravel Data into a more friendly DataFrame

Time:11-08

Sorry if this has been asked before, couldn't find the forum cause I'm not even sure how to look this up. But this is my problem, I have this dataframe in R:

Area        Item                                                              Year      Unit  Value
   <chr>       <chr>                                                             <chr>     <chr> <chr>
 1 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2000-2002 %     87   
 2 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2001-2003 %     88   
 3 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2002-2004 %     91   
 4 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2003-2005 %     92   
 5 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2004-2006 %     92   
 6 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2005-2007 %     94   
 7 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2006-2008 %     95   
 8 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2007-2009 %     97   
 9 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2008-2010 %     100  
10 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2009-2011 %     102

There are 51 items in the data frame, but some areas and some years don't have certain items. I would like an outcome as follows to be able to play with Correlation matrixes, heatmaps, Data Viz, etc., but I don't know how:

Area         Year       Item1 Item2 ... Item52
Afghanistan  2000-2002  87    NA    ... NA
Afghanistan  2001-2002  NA*    88    ... NA  

*It could happen that there is an Item1 value for 2001-2002 in Afghanistan, but for the example I didn't put it.

Where Item-i is the name of the 51 different items and the dataframe is filled with NAs where the value for that Item in that area and year isn't meassured.

Thanks!

CodePudding user response:

Given your explanation i assume the data is sorted i.e. 51 elements with possible missings as NA in order.

df<-data.frame(Area=c(rep("Afghanistan", 51*12),
                     rep("Pakistan", 51*12)), 
              Item=paste("Average dieatary item", rep(1:51, each=12)), 
              Year = rep(paste(2000:2011, 2002:2013, sep="-"), 51),
              Value = c(87,88,91,92,92,94,95,97,100,102,200,300, sample(100, 51*2*12-12,T)))

result <-do.call(rbind, by(df, list(df$Year, df$Area), function(x) {
  data <- data.frame(Area = unique(x$Area), Year = unique(x$Year), t(x$Value))
  colnames(data)[3:53] = paste("Item",1:51)
  data
}))

print(head(result[c(1,2,3:5,50:51)]))
#>          Area      Year Item 1 Item 2 Item 3 Item 48 Item 49
#> 1 Afghanistan 2000-2002     87     91     50      52      10
#> 2 Afghanistan 2001-2003     88     20     91      46      67
#> 3 Afghanistan 2002-2004     91     30     15      88      83
#> 4 Afghanistan 2003-2005     92     74     21      29      17
#> 5 Afghanistan 2004-2006     92     87     65      71      66
#> 6 Afghanistan 2005-2007     94     58     41      46      49

other way with dplyr/tidyr

library(dplyr)
library(tidyr)

result2 <- df %>% group_by(Year, Area) %>% mutate(id= 1:n()) %>% 
  select(Area, id, Year, Value) %>% 
  pivot_wider(c(Area, Year), names_from = id, names_prefix = "Item", values_from = Value)

print(head(result2[c(1,2,3:5,50:51)]))
#> # A tibble: 6 × 7
#> # Groups:   Year, Area [6]
#>   Area        Year      Item1 Item2 Item3 Item48 Item49
#>   <chr>       <chr>     <dbl> <dbl> <dbl>  <dbl>  <dbl>
#> 1 Afghanistan 2000-2002    87    91    50     52     10
#> 2 Afghanistan 2001-2003    88    20    91     46     67
#> 3 Afghanistan 2002-2004    91    30    15     88     83
#> 4 Afghanistan 2003-2005    92    74    21     29     17
#> 5 Afghanistan 2004-2006    92    87    65     71     66
#> 6 Afghanistan 2005-2007    94    58    41     46     49

Created on 2022-11-07 with reprex v2.0.2

  • Related