Home > Mobile >  arranging data by date (month/day format)
arranging data by date (month/day format)

Time:10-05

After I append 4 different dataframes in:

list_1 = [ ]

I have the following data stored in list_1:

| date       | 16/17 | 
| --------   | ------|
| 2016-12-29 | 50    | 
| 2016-12-30 | 52    | 
| 2017-01-01 | 53    | 
| 2017-01-02 | 51    |
[4 rows x 1 columns],
             16/17

| date       | 17/18 | 
| --------   | ------| 
| 2017-12-29 | 60    | 
| 2017-12-31 | 62    | 
| 2018-01-01 | 64    | 
| 2018-01-03 | 65    | 
[4 rows x 1 columns],
             17/18

| date       | 18/19 |
| --------   | ------| 
| 2018-12-30 | 54    | 
| 2018-12-31 | 53    | 
| 2019-01-02 | 52    | 
| 2019-01-03 | 51    | 
[4 rows x 1 columns],
             18/19

| date       | 19/20 |
| --------   | ------| 
| 2019-12-29 | 62    | 
| 2019-12-30 | 63    | 
| 2020-01-01 | 62    | 
| 2020-01-02 | 60    | 
[4 rows x 1 columns],
             19/20

For changing the date format to month/day I use the following code:

pd.to_datetime(df['date']).dt.strftime('%m/%d')

But the problem is when I want to arrange the data by months/days like that:

| date     | 16/17 | 17/18 | 18/19 | 19/20 |
| -------- | ------| ------| ------| ------|
| 12/29    | 50    | 60    | NaN   | 62    |
| 12/30    | 52    | NaN   | 54    | 63    |
| 12/31    | NaN   | 62    | 53    | NaN   |
| 01/01    | 53    | 64    | NaN   | 62    |
| 01/02    | 51    | NaN   | 52    | 60    |
| 01/03    | NaN   | 65    | 51    | NaN   |

I've tried the following:

df = pd.concat(list_1,axis=1)

also:

df = pd.concat(list_1)
df.reset_index(inplace=True)
df = df.groupby(['date']).first()

also:

df = pd.concat(list_1)
df.reset_index(inplace=True)
df = df.groupby(['date'] sort=False).first()

but still cannot achieve the desired result.

CodePudding user response:

You can use sort=False in groupby and create new column for subtract by first value of DatetimeIndex and use it for sorting:

def f(x):
    x.index = pd.to_datetime(x.index)
    return x.assign(new =  x.index - x.index.min())

L = [x.pipe(f) for x in list_1]
df = pd.concat(L, axis=0).sort_values('new', kind='mergesort')

df = df.groupby(df.index.strftime('%m/%d'), sort=False).first().drop('new', axis=1)
print (df)
       16/17  17/18  18/19  19/20
date                             
12/29   50.0   60.0    NaN   62.0
12/30   52.0    NaN   54.0   63.0
12/31    NaN   62.0   53.0    NaN
01/01   53.0   64.0    NaN   62.0
01/02   51.0    NaN   52.0   60.0
01/03    NaN   65.0   51.0    NaN
  • Related