Home > front end >  How do I reorder a long string of concatenated date and timestamps seperated by commas using Python?
How do I reorder a long string of concatenated date and timestamps seperated by commas using Python?

Time:05-13

I have a string type column called 'datetimes' that contains multiple dates with their timestamps, and I'm trying to extract the earliest and last dates (without the timestamps) into new columns called 'earliest_date' and 'last date'.

The problem, however, is that the dates are not in order, so it's not as straightforward as using a str.split() method to get the first and last dates in the string. I need to order them first in ascending order.

Here's an example of an entry for one of the rows: 2022-04-13 04:47:00,2022-04-07 01:58:00,2022-03-31 02:32:00,2022-03-25 11:59:00,2022-04-12 05:07:00,2022-03-29 01:46:00,2022-03-31 05:52:00,

As you can see, the order is randomized. I would like to firstly remove the timestamps which are fortunately in between a whitespace and comma, then order the dates in ascending order, and then finally get the max and min dates into two separate columns.

Can anyone please help me? Thanks in advance :)

`df['Campaign Interaction Dates'] = df['Campaign Interaction Dates'].str.replace('/','-')

def normalise(d): if len(t := d.split('-')) == 3: return d if len(t[0]) == 4 else '-'.join(reversed(t)) return '9999-99-99'

out = sorted(normalise(t[:10]) for t in str(df[df['Campaign Interaction Dates']]).split(',') if t)

df['out'] = out[1]

print(display(df[df['Number of Campaign Codes registered']==3]))`

CodePudding user response:

You can use following code if you are not sure that date format will always be YYYY-MM-DD:

import datetime

string= "2022-04-13 04:47:00,2022-04-07 01:58:00,2022-03-31 02:32:00,2022-03-25 11:59:00,2022-04-12 05:07:00,2022-03-29 01:46:00,2022-03-31 05:52:00"
dates_list = [date[:10] for date in string.split(',')]
dates_list.sort(key=lambda x: datetime.datetime.strptime(x, '%Y-%m-%d'))
min_date, max_date = dates_list[0], dates_list[-1]

You can easily replace date format here

CodePudding user response:

The dates are conveniently in the YYYY-MM-DD format which means that they can be sorted lexically. So it's just:

row = '13-04-2022 04:47:00,2022-04-07 01:58:00,2022-03-31 02:32:00,2022-03-25 11:59:00,2022-04-12 05:07:00,2022-03-29 01:46:00,2022-03-31 05:52:00'

def normalise(d):
    if len(t := d.split('-')) == 3:
        return d if len(t[0]) == 4 else '-'.join(reversed(t))
    return '9999-99-99'

out = sorted(normalise(t[:10]) for t in row.split(',') if t)

print(f'Lo={out[0]}')
print(f'Hi={out[-1]}')

Output:

Lo=2022-03-25
Hi=2022-04-13

EDIT:

Added normalisation to handle either YYYY-MM-DD or DD-MM-YYYY. The normal form will be YYYY-MM-DD. Any other (unexpected) format will cause your computer to melt down.

Note that the first date in the input string is now in DD-MM-YYY format and yet the output is the same as before (as expected)

CodePudding user response:

string = "2022-04-13 04:47:00,2022-04-07 01:58:00,2022-03-31 02:32:00,2022-03-25 11:59:00,2022-04-12 05:07:00,2022-03-29 01:46:00,2022-03-31 05:52:00"
split_string = string.split(",")
split_string.sort()

new_list = []

for i in split_string:
    temp_list = i.split()
    new_list.append(temp_list[0])

max_date = new_list[-1]
min_date = new_list[0]
  • Related