Home > Enterprise >  How can I sort list lines by time if the times are in different locations in the line?
How can I sort list lines by time if the times are in different locations in the line?

Time:11-15

I have been making a program that sorts lines of weather data. The lines of data are needing to be sorted in order of time. I am receiving a list of weather lines that are formatted slightly differently to each other, and depending on the weather behaviour and how fast the changes are occurring the line will start with FM or BECMG.

I am able to sort lines of weather where the time listed is in the same index location each time (index [0]). for example:

FM131200 20010KT 5000 SHOWERS OF LIGHT RAIN SCT006 BKN010 

and

FM131400 20010KT 9999 SHOWERS OF LIGHT RAIN SCT006 FEW030

From the two examples above, the first one's time is displaying the 13th day of the month, at 12:00. In the second one it's the 14th day of the month and 14:00. This format is fine because the time index is in the same index on both, but if I have a situation like below, my sorting doesn't work.

FM131400 20010KT 9999 SHOWERS OF LIGHT RAIN SCT006 BKN010

and

BECMG 1315/1317 27007KT 9999 SHOWERS OF LIGHT RAIN SCT020 BKN030

From the examples above, the first line is obviously the same as the previous examples but the second one is different in location (index [1]) and format. The time in the second line is the 13th day of the month at 15:00.

I have this as an example for how I am sorting them chronologically for now, but it only works if the line has the time at index [0].

import re

total_print = ['\nBECMG 1315/1317 27007KT 9999 SHOWERS OF LIGHT RAIN SCT020 BKN030', '\nFM131200 20010KT 9999 SHOWERS OF LIGHT RAIN SCT006 BKN010',
               '\nFM131400 20010KT 9999 SHOWERS OF LIGHT RAIN SCT006 BKN010']

data = {
    'other': [], # anything with FM or BECMG
}

for line in total_print:
    key = 'other'
    data[key].append(line)

final = []
data['other'] = sorted(data['other'], key=lambda x: x.split()[0])

for lst in data.values():
    for line in lst:
        final.append('\n'   line[1:])

print(' '.join(final))

The lines are supplied in a random order and are sometimes all starting with BECMG or all with FM and sometimes both. So I need to find a way to sort them no matter how they come.

How can I sort the lines in chronological order regardless of whether the line is starting with FM or BECMG?? Should I use Regex and isolate the times?? Can anyone help with this please, I am stuck?

CodePudding user response:

You can extract the times using regex, and then use this time as the "key" to sort on

import re

pattern = r"((?<=FM)\d{6})|(?<=BECMG )\d{4}"
matcher = re.compile(pattern)

data = ['FM131200 20010KT 5000 SHOWERS OF LIGHT RAIN SCT006 BKN010 ',
 'FM131400 20010KT 9999 SHOWERS OF LIGHT RAIN SCT006 FEW030',
 'BECMG 1315/1317 27007KT 9999 SHOWERS OF LIGHT RAIN SCT020 BKN030',
 'FM131400 20010KT 9999 SHOWERS OF LIGHT RAIN SCT006 BKN010']

print(sorted(data, key=lambda item: matcher.search(item).group()))

This will print:

['FM131200 20010KT 5000 SHOWERS OF LIGHT RAIN SCT006 BKN010 ',
 'FM131400 20010KT 9999 SHOWERS OF LIGHT RAIN SCT006 FEW030',
 'FM131400 20010KT 9999 SHOWERS OF LIGHT RAIN SCT006 BKN010',
 'BECMG 1315/1317 27007KT 9999 SHOWERS OF LIGHT RAIN SCT020 BKN030']

CodePudding user response:

If the line begins with 'FM', then the time is in characters 2,3,4,5 of the line. If the line begins with BECMG, then the time is in characters 6,7,8,9 of the line.

You can use that as the key for sorting:

data = ['\nFM131200 20010KT 5000 SHOWERS OF LIGHT RAIN SCT006 BKN010 ',
 '\nFM131400 20010KT 9999 SHOWERS OF LIGHT RAIN SCT006 FEW030',
 '\nBECMG 1315/1317 27007KT 9999 SHOWERS OF LIGHT RAIN SCT020 BKN030',
 '\nFM131400 20010KT 9999 SHOWERS OF LIGHT RAIN SCT006 BKN010']

data = [s.strip() for s in data]

def sorting_key(s):
  if s.startswith('FM'):
    return int(s[2:4]), int(s[4:6])
  elif s.startswith('BECMG'):
    return int(s[6:8]), int(s[8:10])
  else:
    raise ValueError(''.join(['Neither FM nor BECMG: ', s]))

data = sorted(data, key=sorting_key)

print(data)
# ['FM131200 20010KT 5000 SHOWERS OF LIGHT RAIN SCT006 BKN010',
#  'FM131400 20010KT 9999 SHOWERS OF LIGHT RAIN SCT006 FEW030',
#  'FM131400 20010KT 9999 SHOWERS OF LIGHT RAIN SCT006 BKN010',
#  'BECMG 1315/1317 27007KT 9999 SHOWERS OF LIGHT RAIN SCT020 BKN030']

This code will raise ValueError when it fails to extract the time correctly:

sorted(['hello'], key=sorting_key)
# ValueError: Neither FM nor BECMG: hello

sorted(['FM13hello'], key=sorting_key)
# ValueError: invalid literal for int() with base 10: 'he'
  • Related