Home > Software engineering >  Trying Sort Data in a Cell by Date in Pandas
Trying Sort Data in a Cell by Date in Pandas

Time:03-09

I have a data set and in a column of which each cell contains data which is an mixture of strings and a date like this

AE
2018-04-30 10:28, Bokning, Johan Skoglund, 1295 imkanal
2019-08-12 15:27, CrossNej, nan, nan
2020-06-25 18:21, CrossNej, nan, nan
:----
2018-09-13 12:25, Bokning, Simon Wallin, 2195 im och frånluften
2019-07-26 16:26, CrossNej, nan, nan
2020-09-01 14:49, RetentionTelsvarare, nan, nan
:----
2019-02-25 14:00, Bokning, Jan Gunnarsson, Imkanal 1495 kr
2019-11-07 15:39, CrossNej, nan, nan
2020-01-14 17:52, CrossNej, nan, nan
2020-12-16 11:14, CrossRensat 12 mån, nan, nan

What I want to do is to arrange values in each cell by date in descending order (from highest to lowest). I have tried almost every solution present online but I have been unable to do so. Either it's only a date or only a string. I am splitting it by line break

main_df["Log"] = outputdf["Log"].map(lambda x: '\n'.join(sorted(x.split('\n'))))

Then I am sorting it by setting accending to false. But the data is not changed by this

main_df.explode('Log').sort_values(by='Log', ascending=False, inplace=True)
main_df = main_df.replace(np.nan,0)

Then I tried this

main_df.loc[np.argsort(main_df.Log.str.split('-').str[-1].astype(int).values)]

and

main_df = main_df.reindex(main_df['Log'].str.extract('(\d )', expand=False).astype(int).sort_values(ascending=False).index).reset_index(drop=True)

but I get this

ValueError: invalid literal for int() with base 10

and this

python int too large to convert to c long

Also tried

    main_df['Log'] = pd.to_datetime(main_df['Log'])
    main_df = main_df.sort_values(by=['Log'], ascending=False)

but nothing seems to be working. Kindly help me solve this problem. A reminder each of the 3 block represents one cell in my column. I want to organize data within a cell not the whole table.

CodePudding user response:

Your description of the problem and expected results is not very clear, I can only guess by myself, try the following code:

def handle(s: str):
    f = pd.Series(s.split('\n')).str.split(r',', n=1, expand=True)
    f['tmp'] = pd.to_datetime(f[0])
    f = f.sort_values('tmp', ascending=False)
    res = f[0].str.cat(f[1], sep=', ').str.cat(sep='\n')
    return res


res = main_df.log.apply(handle)
print(res)

CodePudding user response:

My suggestion will be first convert your AE column into an array and I believe the array will be look like this.

array = ['2018-04-30 10:28,Bokning,Johan Skoglund,1295 imkanal',
'2019-08-12 15:27, CrossNej, nan, nan',
'2020-06-25 18:21, CrossNej, nan, nan']

Once you convert into an array then you can run sorted function. Your sorted function will be something look like this.

sorted(array, reverse=True, key=lambda x: datetime.datetime.strptime(x.split(',')[0], '%Y-%m-%d %H:%M'))

reverse = True will use for descending.

  • Related