Home > Blockchain >  Sorting dates row-wise
Sorting dates row-wise

Time:05-28

I have the following dataset:

my_data <- structure(list(id = 1:5, d1 = structure(c(16764, 11375, 13873, 
10665, 14395), class = "Date"), d2 = structure(c(14487, 14709, 
11417, 13531, 13457), class = "Date"), d3 = structure(c(15706, 
13542, 16722, 17001, 12621), class = "Date"), d4 = structure(c(14628, 
11210, 14134, 14597, 15119), class = "Date"), d5 = structure(c(10664, 
18150, 15536, 16109, 12236), class = "Date")), row.names = c(NA, 
5L), class = "data.frame")

 id         d1         d2         d3         d4         d5
1  1 2015-11-25 2009-08-31 2013-01-01 2010-01-19 1999-03-14
2  2 2001-02-22 2010-04-10 2007-01-29 2000-09-10 2019-09-11
3  3 2007-12-26 2001-04-05 2015-10-14 2008-09-12 2012-07-15
4  4 1999-03-15 2007-01-18 2016-07-19 2009-12-19 2014-02-08
5  5 2009-05-31 2006-11-05 2004-07-22 2011-05-25 2003-07-03
  • For each row, I would like the smallest of the 5 dates to be assigned to d1, the second smallest of the 5 dates assigned to d2, the third smallest of the 5 dates assigned to d3, the fourth smallest of the 5 dates assigned to d4, and the least smallest of the 5 dates assigned to d5.

This would look something like this (e.g. first row - repeat for the remaining 99 others):

id         d1         d2         d3         d4         d5
1  1 1999-03-14 2009-08-31 2010-01-19  2013-01-01 2015-11-25

I tried looking for similar examples on Stackoverflow, but so far I could not find anything that matches what I am trying to accomplish (e.g. Sorting each row of a data frame).

Can someone please show me how to do this?

Thanks!

Note: Closest example I got to work (does not involve dates):

a = rnorm(100,100,100)
b = rnorm(100,100, 100)
c = rnorm(100,100,100)
d = rnorm(100,100,100)
e = rnorm(100,100,100)

df = data.frame(a,b,c,d,e)

 head(df)
          a          b         c         d         e
1  56.48320 -55.274406  83.32993 -96.55970 181.65859
2  94.17800  88.084876  31.58830 -44.06487 156.85318
3 257.56950  28.500542  34.76845  57.00857  51.76036
4 100.22489   6.946803  60.88848 116.18413  37.34444
5  21.77935  -7.538119  89.29565 -67.28311  43.98728
6 200.18950  -1.555829 123.91148 106.45983 107.50339

# now sort them (check if for each row, numbers increase from left to right)

n = data.frame(t(apply(df, 1, sort)))

          X1         X2        X3        X4        X5
1 -96.559698 -55.274406  56.48320  83.32993 181.65859
2 -44.064873  31.588295  88.08488  94.17800 156.85318
3  28.500542  34.768449  51.76036  57.00857 257.56950
4   6.946803  37.344439  60.88848 100.22489 116.18413
5 -67.283113  -7.538119  21.77935  43.98728  89.29565
6  -1.555829 106.459828 107.50339 123.91148 200.18950

CodePudding user response:

Base R:

Here we transpose my_data after we sort all columns of my_data but the first one using the apply function. Finally we add the first column with cbind(my_data[1], ...

my_data_changed <- cbind(my_data[1], t(apply(my_data[-1], 1, sort)))
my_data_changed <- setnames(my_data_changed, colnames(my_data))
  id         d1         d2         d3         d4         d5
1  1 1999-03-14 2009-08-31 2010-01-19 2013-01-01 2015-11-25
2  2 2000-09-10 2001-02-22 2007-01-29 2010-04-10 2019-09-11
3  3 2001-04-05 2007-12-26 2008-09-12 2012-07-15 2015-10-14
4  4 1999-03-15 2007-01-18 2009-12-19 2014-02-08 2016-07-19
5  5 2003-07-03 2004-07-22 2006-11-05 2009-05-31 2011-05-25

OR

We could first bring to long format with pivot_longer then use sort and pivot back with pivot_wider.

Trick: mutate(value = sort(value) changes only value

library(dplyr)
library(tidyr)

my_data %>% 
  pivot_longer(
    -id
  ) %>% 
  group_by(id) %>% 
  mutate(value = sort(value)) %>% 
  pivot_wider()
    id d1         d2         d3         d4         d5        
  <int> <date>     <date>     <date>     <date>     <date>    
1     1 1999-03-14 2009-08-31 2010-01-19 2013-01-01 2015-11-25
2     2 2000-09-10 2001-02-22 2007-01-29 2010-04-10 2019-09-11
3     3 2001-04-05 2007-12-26 2008-09-12 2012-07-15 2015-10-14
4     4 1999-03-15 2007-01-18 2009-12-19 2014-02-08 2016-07-19
5     5 2003-07-03 2004-07-22 2006-11-05 2009-05-31 2011-05-25

CodePudding user response:

The apply approach also works with dates. They just get coerced to a character matrix, but we can coerce as.data.frame and lapply as.Date over it.

my_data[-1] <- as.data.frame(t(apply(my_data[-1], 1, sort))) |> lapply(as.Date)

Gives

my_data
#   id         d1         d2         d3         d4         d5
# 1  1 1999-03-14 2009-08-31 2010-01-19 2013-01-01 2015-11-25
# 2  2 2000-09-10 2001-02-22 2007-01-29 2010-04-10 2019-09-11
# 3  3 2001-04-05 2007-12-26 2008-09-12 2012-07-15 2015-10-14
# 4  4 1999-03-15 2007-01-18 2009-12-19 2014-02-08 2016-07-19
# 5  5 2003-07-03 2004-07-22 2006-11-05 2009-05-31 2011-05-25

Where

str(my_data)
# 'data.frame': 5 obs. of  6 variables:
# $ id: int  1 2 3 4 5
# $ d1: Date, format: "1999-03-14" "2000-09-10" "2001-04-05" "1999-03-15" ...
# $ d2: Date, format: "2009-08-31" "2001-02-22" "2007-12-26" "2007-01-18" ...
# $ d3: Date, format: "2010-01-19" "2007-01-29" "2008-09-12" "2009-12-19" ...
# $ d4: Date, format: "2013-01-01" "2010-04-10" "2012-07-15" "2014-02-08" ...
# $ d5: Date, format: "2015-11-25" "2019-09-11" "2015-10-14" "2016-07-19" ...
  • Related