I am trying to pair rows for use in a dumbbell plot. I have a df that looks like this:
Year | Species | Tonnes |
---|---|---|
1960 | Cod | 123 |
1961 | Cod | 456 |
1970 | Cod | 124 |
1971 | Cod | 457 |
I want to pair the up results 10 years apart, resulting in this df:
Year | Species | Tonnes | Pair |
---|---|---|---|
1960 | Cod | 123 | 1 |
1961 | Cod | 456 | 2 |
1970 | Cod | 124 | 1 |
1971 | Cod | 457 | 2 |
I would very much appreciate help. I wasn't too sure where to begin with the problem.
CodePudding user response:
You could do
df <- structure(list(Year = c(1960L, 1961L, 1970L, 1971L), Species = c("Cod",
"Cod", "Cod", "Cod"), Tonnes = c(123, 150, 256, 450)), row.names = c(NA,
-4L), class = "data.frame")
library(tidyverse)
df %>%
mutate(year = Year %% 10,
decade = 10 * Year %/% 10) %>%
select(-Year) %>%
group_by(Species, year) %>%
summarize(from = Tonnes[which.min(decade)],
to = Tonnes[which.max(decade)],
year = paste(min(year decade), max(year decade), sep = '-')) %>%
ggplot(aes(from, year))
geom_linerange(aes(xmin = from, xmax = to), alpha = 0.5)
geom_point(color = 'green4', size = 3)
geom_point(aes(x = to), color = 'red3', size = 3)
xlab('Tonnes')
theme_minimal(base_size = 16)
CodePudding user response:
Using data.table
, a join will get the pairs in wide format:
library(data.table)
dt <- setDT(df)[
, `:=`(Year2 = Year 10, Pair = rleid(Year, Species))
][
df,
.(Year1 = i.Year, Year2 = x.Year, Species, Tonnes1 = i.Tonnes, Tonnes2 = Tonnes, Pair = i.Pair),
on = .(Year = Year2, Species), nomatch = 0
]
dt
#> Year1 Year2 Species Tonnes1 Tonnes2 Pair
#> 1: 1960 1970 Cod 123 124 1
#> 2: 1961 1971 Cod 456 457 2
which can be melted to long format, if desired:
setcolorder(
melt(dt, c("Species", "Pair"), list(c("Year1", "Year2"), c("Tonnes1", "Tonnes2")), value.name = c("Year", "Tonnes")),
c("Year", "Species", "Tonnes", "Pair")
)[, variable := NULL][]
#> Year Species Tonnes Pair
#> 1: 1960 Cod 123 1
#> 2: 1961 Cod 456 2
#> 3: 1970 Cod 124 1
#> 4: 1971 Cod 457 2
Data:
df <- data.frame(Year = c(1960, 1961, 1970, 1971), Species = "Cod", Tonnes = c(123, 456, 124, 457))