I have a list waterlevels
of nine identical dataframes containing information about piezometers K01 to K09. Each of these dataframes contains 96860 rows, all of them having 13 variables, except for one having 21 variables (the extra columns in these one are not important).
$K01
dateandtime seconds pressure_kPa temperature_.C baropressure_kpa barocompensation_kpa waterlevel_cm lengthcables_cm
1 2019-11-05 00:00:00 0 111.482 13.261 98.854 12.628 128.7697 490
2 2019-11-05 00:15:00 900 111.506 13.261 98.883 12.623 128.7188 490
3 2019-11-05 00:30:00 1800 111.511 13.261 98.872 12.639 128.8819 490
4 2019-11-05 00:45:00 2700 111.544 13.261 98.898 12.646 128.9533 490
5 2019-11-05 01:00:00 3600 111.536 13.313 98.913 12.623 128.7188 490
I want to melt this list to one dataframe containing only one variable barocompensation_kpa
and the dateandtime
variable using following code:
waterlevels_all <- melt(waterlevels, id.vars=c("dateandtime", "barocompensation_kpa"))
But then the resulting dataframe waterlevels_all
has 10364020 observations, while only this df only should have 871740 rows (9*96860). So it has to have 3 variables: dateandtime
, barocompensation_kpa
and L1
being the name of corresponding dataframe (= name of the piezometer - K01 to K09).
> head(waterlevels_all)
dateandtime barocompensation_kpa variable value L1
1 2019-11-05 00:00:00 17.55336 seconds 0 K01
2 2019-11-05 00:15:00 17.55489 seconds 900 K01
3 2019-11-05 00:30:00 17.55703 seconds 1800 K01
4 2019-11-05 00:45:00 17.55347 seconds 2700 K01
5 2019-11-05 01:00:00 17.55540 seconds 3600 K01
6 2019-11-05 01:15:00 17.56050 seconds 4500 K01
How is this possible? How do I get the right dataframe?
CodePudding user response:
Here are some options with tidyverse
- Loop over the list
with imap
, select
the columns of interest and create the new column 'L1' with the list element name (.y
)
library(dplyr)
library(purrr)
imap_dfr(waterlevels, ~ .x %>%
select(dateandtime, barocompensation_kpa) %>%
mutate(L1 = .y))
Or with map
, just select
the columns and create the new column with .id
(_dfr
returns a single dataset by rbinding the list elements)
map_dfr(waterlevels, ~ .x %>%
select(dateandtime, barocompensation_kpa), .id = "L1")
CodePudding user response:
Try lapply
with a subsequent Map
and rbind
.
lapply(lst, subset, select=c('dateandtime', 'barocompensation_kpa')) |>
Map(f=`[<-`, 'L1', value=names(lst)) |>
# c(make.row.names=FALSE) |> ## optional, to reset row names
do.call(what=rbind)
# dateandtime barocompensation_kpa L1
# K01.1 0 0 K01
# K01.2 0 0 K01
# K01.3 0 0 K01
# K02.1 0 0 K02
# K02.2 0 0 K02
# K02.3 0 0 K02
# K03.1 0 0 K03
# K03.2 0 0 K03
# K03.3 0 0 K03
Data:
lst <- list(K01 = structure(list(dateandtime = c(0, 0, 0), seconds = c(0,
0, 0), pressure_kPa = c(0, 0, 0), temperature_.C = c(0, 0, 0),
baropressure_kpa = c(0, 0, 0), barocompensation_kpa = c(0,
0, 0), waterlevel_cm = c(0, 0, 0), lengthcables_cm = c(0,
0, 0)), row.names = c(NA, -3L), class = "data.frame"), K02 = structure(list(
dateandtime = c(0, 0, 0), seconds = c(0, 0, 0), pressure_kPa = c(0,
0, 0), temperature_.C = c(0, 0, 0), baropressure_kpa = c(0,
0, 0), barocompensation_kpa = c(0, 0, 0), waterlevel_cm = c(0,
0, 0), lengthcables_cm = c(0, 0, 0)), row.names = c(NA, -3L
), class = "data.frame"), K03 = structure(list(dateandtime = c(0,
0, 0), seconds = c(0, 0, 0), pressure_kPa = c(0, 0, 0), temperature_.C = c(0,
0, 0), baropressure_kpa = c(0, 0, 0), barocompensation_kpa = c(0,
0, 0), waterlevel_cm = c(0, 0, 0), lengthcables_cm = c(0, 0,
0), X9 = c(0, 0, 0), X10 = c(0, 0, 0), X11 = c(0, 0, 0), X12 = c(0,
0, 0), X13 = c(0, 0, 0), X14 = c(0, 0, 0), X15 = c(0, 0, 0),
X16 = c(0, 0, 0), X17 = c(0, 0, 0), X18 = c(0, 0, 0), X19 = c(0,
0, 0), X20 = c(0, 0, 0), X21 = c(0, 0, 0)), class = "data.frame", row.names = c(NA,
-3L)))