I have a data set and in a column of which each cell contains data which is an mixture of strings and a date like this
AE |
---|
2018-04-30 10:28, Bokning, Johan Skoglund, 1295 imkanal |
2019-08-12 15:27, CrossNej, nan, nan |
2020-06-25 18:21, CrossNej, nan, nan |
:---- |
2018-09-13 12:25, Bokning, Simon Wallin, 2195 im och frånluften |
2019-07-26 16:26, CrossNej, nan, nan |
2020-09-01 14:49, RetentionTelsvarare, nan, nan |
:---- |
2019-02-25 14:00, Bokning, Jan Gunnarsson, Imkanal 1495 kr |
2019-11-07 15:39, CrossNej, nan, nan |
2020-01-14 17:52, CrossNej, nan, nan |
2020-12-16 11:14, CrossRensat 12 mån, nan, nan |
What I want to do is to arrange values in each cell by date in descending order (from highest to lowest). I have tried almost every solution present online but I have been unable to do so. Either it's only a date or only a string. I am splitting it by line break
main_df["Log"] = outputdf["Log"].map(lambda x: '\n'.join(sorted(x.split('\n'))))
Then I am sorting it by setting accending to false. But the data is not changed by this
main_df.explode('Log').sort_values(by='Log', ascending=False, inplace=True)
main_df = main_df.replace(np.nan,0)
Then I tried this
main_df.loc[np.argsort(main_df.Log.str.split('-').str[-1].astype(int).values)]
and
main_df = main_df.reindex(main_df['Log'].str.extract('(\d )', expand=False).astype(int).sort_values(ascending=False).index).reset_index(drop=True)
but I get this
ValueError: invalid literal for int() with base 10
and this
python int too large to convert to c long
Also tried
main_df['Log'] = pd.to_datetime(main_df['Log'])
main_df = main_df.sort_values(by=['Log'], ascending=False)
but nothing seems to be working. Kindly help me solve this problem. A reminder each of the 3 block represents one cell in my column. I want to organize data within a cell not the whole table.
CodePudding user response:
Your description of the problem and expected results is not very clear, I can only guess by myself, try the following code:
def handle(s: str):
f = pd.Series(s.split('\n')).str.split(r',', n=1, expand=True)
f['tmp'] = pd.to_datetime(f[0])
f = f.sort_values('tmp', ascending=False)
res = f[0].str.cat(f[1], sep=', ').str.cat(sep='\n')
return res
res = main_df.log.apply(handle)
print(res)
CodePudding user response:
My suggestion will be first convert your AE column into an array and I believe the array will be look like this.
array = ['2018-04-30 10:28,Bokning,Johan Skoglund,1295 imkanal',
'2019-08-12 15:27, CrossNej, nan, nan',
'2020-06-25 18:21, CrossNej, nan, nan']
Once you convert into an array then you can run sorted function. Your sorted function will be something look like this.
sorted(array, reverse=True, key=lambda x: datetime.datetime.strptime(x.split(',')[0], '%Y-%m-%d %H:%M'))
reverse = True will use for descending.