Home > database >  Creating row-wise pairs in R
Creating row-wise pairs in R

Time:01-18

I am trying to pair rows for use in a dumbbell plot. I have a df that looks like this:

Year Species Tonnes
1960 Cod 123
1961 Cod 456
1970 Cod 124
1971 Cod 457

I want to pair the up results 10 years apart, resulting in this df:

Year Species Tonnes Pair
1960 Cod 123 1
1961 Cod 456 2
1970 Cod 124 1
1971 Cod 457 2

I would very much appreciate help. I wasn't too sure where to begin with the problem.

CodePudding user response:

You could do

df <- structure(list(Year = c(1960L, 1961L, 1970L, 1971L), Species = c("Cod", 
"Cod", "Cod", "Cod"), Tonnes = c(123, 150, 256, 450)), row.names = c(NA, 
-4L), class = "data.frame")

library(tidyverse)

df %>%
  mutate(year = Year %% 10,
         decade = 10 * Year %/% 10) %>%
  select(-Year) %>%
  group_by(Species, year) %>%
  summarize(from = Tonnes[which.min(decade)],
            to = Tonnes[which.max(decade)],
            year = paste(min(year   decade), max(year   decade), sep = '-')) %>%
  ggplot(aes(from, year))  
  geom_linerange(aes(xmin = from, xmax = to), alpha = 0.5)  
  geom_point(color = 'green4', size = 3)  
  geom_point(aes(x = to), color = 'red3', size = 3)   
  xlab('Tonnes')  
  theme_minimal(base_size = 16)

enter image description here

CodePudding user response:

Using data.table, a join will get the pairs in wide format:

library(data.table)

dt <- setDT(df)[
  , `:=`(Year2 = Year   10, Pair =  rleid(Year, Species))
][
  df,
  .(Year1 = i.Year, Year2 = x.Year, Species, Tonnes1 = i.Tonnes, Tonnes2 = Tonnes, Pair = i.Pair),
  on = .(Year = Year2, Species), nomatch = 0
]
dt
#>    Year1 Year2 Species Tonnes1 Tonnes2 Pair
#> 1:  1960  1970     Cod     123     124    1
#> 2:  1961  1971     Cod     456     457    2

which can be melted to long format, if desired:

setcolorder(
  melt(dt, c("Species", "Pair"), list(c("Year1", "Year2"), c("Tonnes1", "Tonnes2")), value.name = c("Year", "Tonnes")),
  c("Year", "Species", "Tonnes", "Pair")
)[, variable := NULL][]
#>    Year Species Tonnes Pair
#> 1: 1960     Cod    123    1
#> 2: 1961     Cod    456    2
#> 3: 1970     Cod    124    1
#> 4: 1971     Cod    457    2

Data:

df <- data.frame(Year = c(1960, 1961, 1970, 1971), Species = "Cod", Tonnes = c(123, 456, 124, 457))
  • Related