Create new column based on general date time format DD/MM/YYYY , but encounter error "ValueErro-CodePudding

I have a pandas data frame column with a general date format that looks like the below. My date format is in DD/MM/YYYY.

       dates
0   11/04/2017
1   17/04/2017
2   23/04/2017
3   02/04/2017
4   30/03/2017

I would like to create a new column based on this dates column, e.g. Expected new column

I tried to use the method suggested in this post Create new column based on date column Pandas

But I am encountering an error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [46], in <cell line: 10>()
      1 cutoff = [
      2     '24/04/2017',
      3     '18/04/2017',
   (...)
      6     '31/03/2017',
      7 ]
      9 cutoff = pd.Series(cutoff).astype('datetime64')
---> 10 final_commit['phase'] = pd.cut(final_commit['dates'], cutoff, labels = [ 4, 3, 2, 1])
     11 print(final_commit.sort_values('dates'))

File ~/Library/Python/3.8/lib/python/site-packages/pandas/core/reshape/tile.py:290, in cut(x, bins, right, labels, retbins, precision, include_lowest, duplicates, ordered)
    288     # GH 26045: cast to float64 to avoid an overflow
    289     if (np.diff(bins.astype("float64")) < 0).any():
--> 290         raise ValueError("bins must increase monotonically.")
    292 fac, bins = _bins_to_cuts(
    293     x,
    294     bins,
   (...)
    301     ordered=ordered,
    302 )
    304 return _postprocess_for_cut(fac, bins, retbins, dtype, original)

ValueError: bins must increase monotonically.

My cutoff for creating the new column is as below

'24/04/2017' -> phase 5
'18/04/2017' -> phase 4
'12/04/2017' -> phase 3
'06/04/2017' -> phase 2
'31/03/2017' -> phase 1

Code I tried

cutoff = [
    '24/04/2017',
    '18/04/2017',
    '12/04/2017',
    '06/04/2017',
    '31/03/2017',
]

cutoff = pd.Series(cutoff).astype('datetime64')
final_commit['phase'] = pd.cut(final_commit['dates'], cutoff, labels = [5, 4, 3, 2, 1])
print(final_commit.sort_values('dates'))

Any suggestion is appreciated. Thank you.

CodePudding user response：

As the error suggests, you need to make sure the cutoff is monotonically increasing. You can pre sort the values using sort_values:

cutoff = pd.to_datetime(cutoff, format='%d/%m/%Y').sort_values()
pd.cut(final_commit['dates'], cutoff, labels=[1,2,3,4])

Example:

final_commit = pd.DataFrame({
    'dates': pd.to_datetime(['2017-04-15',  '2017-04-03'])
})
pd.cut(final_commit['dates'], cutoff, labels=[1,2,3,4])

#0    3
#1    1
#Name: dates, dtype: category
#Categories (4, int64): [1 < 2 < 3 < 4]