I have a data frame that corresponds to the path taken by a river, describing elevation and distance. I need to evaluate each different ground path traveled by the river and extract this information.
Example:
df = data.frame(Soil = c("Forest", "Forest",
"Grass", "Grass","Grass",
"Scrub", "Scrub","Scrub","Scrub",
"Grass", "Grass","Grass","Grass",
"Forest","Forest","Forest","Forest","Forest","Forest"),
Distance = c(1, 5,
10, 15, 56,
59, 67, 89, 99,
102, 105, 130, 139,
143, 145, 167, 189, 190, 230),
Elevation = c(1500, 1499,
1470, 1467, 1456,
1450, 1445, 1440, 1435,
1430, 1420, 1412, 1400,
1390, 1387, 1384, 1380, 1376, 1370))
Soil Distance Elevation
1 Forest 1 1500
2 Forest 5 1499
3 Grass 10 1470
4 Grass 15 1467
5 Grass 56 1456
6 Scrub 59 1450
7 Scrub 67 1445
8 Scrub 89 1440
9 Scrub 99 1435
10 Grass 102 1430
11 Grass 105 1420
12 Grass 130 1412
13 Grass 139 1400
14 Forest 143 1390
15 Forest 145 1387
16 Forest 167 1384
17 Forest 189 1380
18 Forest 190 1376
19 Forest 230 1370
But i need to something like this:
Soil Distance.Min Distance.Max Elevation.Min Elevation.Max
1 Forest 1 5 1499 1500
2 Grass 10 56 1456 1470
3 Scrub 59 99 1435 1450
4 Grass 102 139 1400 1430
5 Forest 143 230 1370 1390
I tried to use group_by()
and which.min(Soil)
, but that takes into account the whole df, not each path.
CodePudding user response:
We need a run-length encoding to track consecutive Soil
.
Using this function (fashioned to mimic data.table::rleid
):
myrleid <- function (x) {
r <- rle(x)
rep(seq_along(r$lengths), times = r$lengths)
}
We can do
df %>%
group_by(grp = myrleid(Soil)) %>%
summarize(Soil = Soil[1], across(c(Distance, Elevation), list(min = min, max = max))) %>%
select(-grp)
# # A tibble: 5 x 5
# Soil Distance_min Distance_max Elevation_min Elevation_max
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Forest 1 5 1499 1500
# 2 Grass 10 56 1456 1470
# 3 Scrub 59 99 1435 1450
# 4 Grass 102 139 1400 1430
# 5 Forest 143 230 1370 1390
CodePudding user response:
You can try this:
df = df %>% mutate(id=data.table::rleid(Soil))
inner_join(
distinct(df %>% select(Soil,id)),
df %>%
group_by(id) %>%
summarize(across(Distance:Elevation, .fns = list("min" = min,"max"=max))),
by="id"
) %>% select(!id)
Output:
Soil Distance_min Distance_max Elevation_min Elevation_max
1 Forest 1 5 1499 1500
2 Grass 10 56 1456 1470
3 Scrub 59 99 1435 1450
4 Grass 102 139 1400 1430
5 Forest 143 230 1370 1390
Or, even more concise, thanks to r2evans.
df %>%
group_by(id = data.table::rleid(Soil)) %>%
summarize(Soil=first(Soil),across(Distance:Elevation, .fns = list("min" = min,"max"=max))) %>%
select(!id)