I'm supposed to combine all four datasets from https://www.drroyspencer.com/latest-global-temperatures/ into one dataframe. I've tried to merge it without luck. Any help?
What I've tried so far.
# Lower troposphere data.
lower_trop <- fread("https://www.nsstc.uah.edu/data/msu/v6.0/tlt/uahncdc_lt_6.0.txt", sep = " ")
# Mid-Troposphere data.
mid_trop <- fread("https://www.nsstc.uah.edu/data/msu/v6.0/tmt/uahncdc_mt_6.0.txt", sep = " ")
# Tropopause data.
trop <- fread("https://www.nsstc.uah.edu/data/msu/v6.0/ttp/uahncdc_tp_6.0.txt", sep = " ")
# Lower Stratosphere data.
lower_strat <- fread("https://www.nsstc.uah.edu/data/msu/v6.0/tls/uahncdc_ls_6.0.txt" , sep = " ")
#
lower_trop1 <- lower_trop[-c(518)]
mid_trop1 <- Mid_trop[-c(518)]
trop1 <- trop[-c(518)]
lower_strat1 <- lower_strat[-c(518)]
#
lower_trop <- as.data.frame(apply(lower_trop1, 2, as.numeric))
mid_trop <- as.data.frame(apply(mid_trop1, 2, as.numeric))
trop_paus <- as.data.frame((apply(trop1, 2, as.numeric)))
lower_strat <- as.data.frame(apply(lower_strat1, 2, as.numeric))
all<- rbind(lower_trop,mid_trop,trop_paus,lower_strat)
CodePudding user response:
Don't ignore warnings. Your
fread
calls are warninglower_trop <- fread("https://www.nsstc.uah.edu/data/msu/v6.0/tlt/uahncdc_lt_6.0.txt", sep = " ") # [100%] Downloaded 89265 bytes... # Warning in fread("https://www.nsstc.uah.edu/data/msu/v6.0/tlt/uahncdc_lt_6.0.txt", : # Stopped early on line 520. Expected 29 fields but found 0. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<Trend 0.14 0.18 0.12 0.16 0.19 0.14 0.11 0.16 0.10 0.12 0.16 0.11 0.18 0.21 0.17 0.10 0.15 0.09 0.25 0.23 0.27 0.01 0.09 -0.02 0.18 0.18 0.18>>
This is because those files have trailing material that are not "proper CSV" (as far as R cares, that is):
Year Mo Globe Land Ocean NH Land Ocean SH Land Ocean Trpcs Land Ocean NoExt Land Ocean SoExt Land Ocean NoPol Land Ocean SoPol Land Ocean USA48 USA49 AUST 1978 12 -0.48 -0.51 -0.47 -0.44 -0.46 -0.42 -0.52 -0.62 -0.50 -0.60 -0.62 -0.59 -0.37 -0.44 -0.30 -0.46 -0.55 -0.45 -0.39 -0.68 -0.06 -0.45 -0.38 -0.49 -1.29 -1.15 -1.29 1979 1 -0.47 -0.64 -0.41 -0.64 -0.86 -0.50 -0.31 -0.13 -0.34 -0.47 -0.54 -0.45 -0.73 -0.93 -0.55 -0.23 0.12 -0.29 -0.46 -0.95 0.10 -0.16 -0.15 -0.16 -3.22 -2.42 0.92 ...snip... 2021 11 0.08 0.22 0.03 0.11 0.32 -0.02 0.06 -0.00 0.07 0.14 0.35 0.08 0.09 0.31 -0.10 0.02 -0.24 0.06 -0.43 -0.19 -0.70 -0.72 -0.67 -0.75 0.50 -0.06 -0.29 2021 12 0.21 0.17 0.22 0.27 0.19 0.32 0.15 0.12 0.15 0.03 0.30 -0.04 0.40 0.19 0.59 0.20 -0.07 0.25 0.01 -0.25 0.31 -0.42 -0.24 -0.51 1.63 1.42 -0.06 Year Mo Globe Land Ocean NH Land Ocean SH Land Ocean Trpcs Land Ocean NoExt Land Ocean SoExt Land Ocean NoPol Land Ocean SoPol Land Ocean USA48 USA49 AUST Trend 0.14 0.18 0.12 0.16 0.19 0.14 0.11 0.16 0.10 0.12 0.16 0.11 0.18 0.21 0.17 0.10 0.15 0.09 0.25 0.23 0.27 0.01 0.09 -0.02 0.18 0.18 0.18 NOTE: Version 6.0 as of April 2015 NOTE: New Reference for annual cycle 1991-2020 NOTE: Version 5.6 as of Jun 2013 NOTE: 3/13 More accurate grid weighting for USA48 NOTE: 3/13 Alaska added in USA49, Australia added GL 90S-90N, NH 0-90N, SH 90S-0, TRPCS 20S-20N NoExt 20N-90N, SoExt 90S-20S, NoPol 60N-90N, SoPol 90S-60S
There is no easy way to just "ignore" those lines, so I suggest you be very clear about it: state that that warning is expected and not a concern here. Your future self (and colleagues) will thank you. You can use this to silence just that warning (and allow
fread
to continue normal operations):lower_trop <- withCallingHandlers( fread("https://www.nsstc.uah.edu/data/msu/v6.0/tlt/uahncdc_lt_6.0.txt", sep = " "), warning = function(w) { if (grepl("Stopped early on line", as.character(w))) invokeRestart("muffleWarning") })
R dislikes non-unique column names. I don't know anything about the data, but it appears that
Land
andOcean
are grouped by the preceding columnsGlobe
,NH
,SH
, etc. Let's rename them to ensure they are unique and unambiguous (not just numbers):names(lower_trop1)[lands] <- paste0(names(lower_trop1)[lands-1], "_", names(lower_trop1)[lands]) names(lower_trop1)[lands 1] <- paste0(names(lower_trop1)[lands-1], "_", names(lower_trop1)[lands 1]) head(lower_trop1) # Year Mo Globe Globe_Land Globe_Ocean NH NH_Land NH_Ocean SH SH_Land SH_Ocean Trpcs Trpcs_Land # <char> <char> <char> <char> <char> <char> <char> <char> <char> <char> <char> <char> <char> # 1: 1978 12 -0.48 -0.51 -0.47 -0.44 -0.46 -0.42 -0.52 -0.62 -0.50 -0.60 -0.62 # 2: 1979 1 -0.47 -0.64 -0.41 -0.64 -0.86 -0.50 -0.31 -0.13 -0.34 -0.47 -0.54 # 3: 1979 2 -0.43 -0.56 -0.39 -0.47 -0.57 -0.41 -0.39 -0.53 -0.37 -0.36 -0.25 # 4: 1979 3 -0.38 -0.51 -0.33 -0.46 -0.51 -0.44 -0.30 -0.53 -0.26 -0.36 -0.43 # 5: 1979 4 -0.40 -0.57 -0.34 -0.47 -0.62 -0.37 -0.34 -0.46 -0.31 -0.35 -0.37 # 6: 1979 5 -0.40 -0.56 -0.33 -0.52 -0.54 -0.52 -0.27 -0.62 -0.19 -0.46 -0.55 # 16 variables not shown: [Trpcs_Ocean <char>, NoExt <char>, NoExt_Land <char>, NoExt_Ocean <char>, SoExt <char>, SoExt_Land <char>, SoExt_Ocean <char>, NoPol <char>, NoPol_Land <char>, NoPol_Ocean <char>, ...]
(Almost) Never use
apply
on adata.frame
. Okay, almost never, there are legitimate times to do it. However, by default,apply
is going to return amatrix
when you really need it to remain adata.frame
(or at leastlist
-like). Uselapply
instead:lower_trop1[] <- lapply(lower_trop1, as.numeric)
Two tricks to know about this and how it works with frames:
A
data.frame
is just a glorifiedlist
with some added conditions: all elements should be named uniquely; all elements (typically) must be the same length; it may have row-names, some tools use/honor/keep them, some tools (dplyr
) ignore or actively remove them. Oh, and theclass
is slightly different. But most things that work onlist
s work just as well ondata.frame
s, column-wise.Even though we're working on columns of a frame, in one sense we're just working on elements of a
list
, and we want it to staylist
-like, so we should be usinglapply
.Unfortunately,
lapply
is going to return alist
not adata.frame
, so the reassignment intolower_trop1[] <- ...
says to replace the contents oflower_trop1
without changing its class. There is likely more that can be said here, but know thatframe[] <- lapply(frame, ...)
preserves thedata.frame
properties we need while allowing very simplelapply
-operations.
From here, rbind
should work as intended.
FYI, you can combine your middle two steps with:
lower_trop1 <- lower_trop[-518,][, lapply(.SD, as.numeric)]
though you still need to do the renaming for unique columns names.
CodePudding user response:
I guess you want some sort of join (so joining the columns from the 4 datasets). Here some information on the different join options in R: https://dplyr.tidyverse.org/reference/mutate-joins.html
Now in your case, you have the Year
and the Mo
columns that appear in your different datasets. So you could join the datasets by these columns. Here an example of a left join (just for the first two dataframes):
First, make sure that you have unique column names in each single dataframe (you do currently not have unique names, so this step is necessary):
names(lower_trop) <- make.unique(names(lower_trop), sep="_")
names(mid_trop) <- make.unique(names(lower_trop), sep="_")
Then you join these with a left join (so you keep all the rows from the left dataset):
new_df <- left_join(lower_trop, mid_trop, by = c("Year", "Mo"), suffix = c(".lower_trop", ".mid_trop"))