Apologies that there is a wealth of information on this site about melting and reshaping data, however, I cannot find the answer to my question on any of the pages I've visited. I have a data set which looks something like:
A Year | A Mean Temp | A Max Temp | A Min Temp | B Year | B Mean Temp | B Max Temp | B Min Temp |
and I want to end up with
Year | A Mean Temp | A Max Temp | A Min Temp |B Mean Temp | B Max Temp | B Min Temp
and fill columns which don't have data for that specific year with 'NA'.
The desired output would be something like:
[Table][1]
I believe the answer lies somewhere in something like:
library(dplyr)
library(tidyr)
library(stringr)
Data %>%
pivot_longer(cols = contains("Year"), names_to = c("Country", ".value"),
names_sep="_", values_drop_na = TRUE) %>%
rename_with(~ str_c('Country_', .), Rating:Year)```
But as of yet no luck.
Any help would be appreciated.
Thank you
Data
structure(list(Antarctica.Year.CE = 167:172, Antarctica.Temp..C. = c(0.33,
0.31, 0.18, 0.08, -0.01, -0.11), Antarctica.Min..C. = c(-1.24,
-1.26, -1.39, -1.48, -1.57, -1.67), Antarctica.Max..C. = c(1.89,
1.87, 1.74, 1.64, 1.55, 1.45), Arctic.Year.CE = 1:6, Arctic.Temp..C. = c(-1.15,
-0.96, -0.32, 0.1, -0.18, -0.61), Arctic.Min..C. = c(-1.92, -1.76,
-1.38, -0.74, -1.08, -1.17), Arctic.Max..C. = c(-0.31, -0.11,
0.48, 0.83, 0.73, 0.16), Asia.Year.CE = 800:805, Asia.Temp..C. = c(-0.31,
-0.14, -0.36, -0.67, -0.78, -0.26), Asia.Min..C. = c(-1.4, -1.23,
-1.45, -1.76, -1.87, -1.35), Asia.Max..C. = c(0.79, 0.96, 0.74,
0.43, 0.31, 0.83), Australasia.Year.CE = 1001:1006, Australasia.Temp..C. = c(-0.24,
-0.38, -0.29, -0.33, -0.34, -0.11), Australasia.Min..C. = c(-0.62,
-0.79, -0.71, -0.73, -0.73, -0.56), Australasia.Max..C. = c(0.15,
0.03, 0.13, 0.07, 0.05, 0.34), Europe.Year.CE = 1:6, Europe.Temp..C. = c(0.09,
-0.26, -0.24, 0.22, 0.32, 0.67), Europe.Min..C. = c(-0.69, -1.14,
-1.18, -0.66, -0.48, -0.11), Europe.Max..C. = c(0.88, 0.56, 0.61,
1.07, 1.14, 1.5), North.America...Pollen.Year.CE = c(480L, 510L,
540L, 570L, 600L, 630L), North.America...Pollen.Temp..C. = c(-0.25,
-0.29, -0.33, -0.34, -0.34, -0.34), North.America...Pollen.Min..C. = c(-0.74,
-0.7, -0.66, -0.65, -0.64, -0.64), North.America...Pollen.Max..C. = c(0.24,
0.11, 0, -0.04, -0.04, -0.04), North.America...Trees.Year.CE = c(1204L,
1214L, 1224L, 1234L, 1244L, 1254L), North.America...Trees.Temp..C. = c(-0.22,
-0.45, -0.38, -0.87, -0.81, -0.06), North.America...Trees.Min..C. = c(-0.53,
-0.72, -0.67, -1.12, -1.09, -0.35), North.America...Trees.Max..C. = c(0.04,
-0.2, -0.11, -0.57, -0.52, 0.18), South.America.Year.CE = 857:862,
South.America.Temp..C. = c(-0.3, -0.21, -0.07, -0.38, -0.41,
-0.19), South.America.Min..C. = c(-1.12, -1, -0.88, -1.19,
-1.22, -0.98), South.America.Max..C. = c(0.53, 0.58, 0.74,
0.43, 0.39, 0.61)), row.names = c(NA, 6L), class = "data.frame") ```
[1]: https://i.stack.imgur.com/0sV7a.png
CodePudding user response:
For something as small as this, I'd often just go with a more manual approach.
Given your df
above, I specify the lists of countries in the columns and then grepl()
on the df columns to select those columns. Then, we rename the columns, return the new dataframe. We can then apply the function to the list of countries and then rbind with do.call
.
country_list = c('Antarctica', 'Arctic', 'Asia', 'Australasia', 'Europe', 'North.America...Pollen', 'North.America...Trees', 'South.America')
get_cols = function(country) {
df_new = df[,grepl(country, colnames(df))]
df_new$Country = rep(country, nrow(df_new))
colnames(df_new) = c('Year', 'Temp', 'Min_Temp', 'Max_Temp', 'Country')
return(df_new)
}
df_final = do.call(rbind, lapply(country_list, get_cols))
Hope that returns what you're looking for?