Home > Software engineering >  How to use regex to make even smaller iteritools sub groups
How to use regex to make even smaller iteritools sub groups

Time:08-30

Right now the paths are only grouped by year. I'm trying to group them by the Frequency and whether the measuring device was horizontal or vertical.

Needed groups (year___Frequency-range___horizontal/vertical) in that order

2020___20-80M___ver  
2020___80M-200MHz___hor   
2020___80M-200MHz___ver  
2020___200M-1000MHz___hor   
2020___200M-1000MHz___ver  
2020___1000M-6000MHz___hor   
2020___1000M-6000MHz___ver  
2020___6000M-8000MHz___hor   
2020___6000M-8000MHz___ver

**next round**

2021___20-80M___ver  
2022___80M-200MHz___hor
paths_by_year = {'2020': ['C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2020_06_19\\1000M-6000MHz_BBHA9120J_mobile-rack_200Vm_hor', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2020_06_19\\1000M-6000MHz_BBHA9120J_mobile-rack_200Vm_ver', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2020_06_19\\20-80M_70Vm_vertical_VHBD9134', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2020_06_19\\200M-1000MHz_STLP9128_FLH1000B_200Vm_hor', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2020_06_19\\200M-1000MHz_STLP9128_FLH1000B_200Vm_ver', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2020_06_19\\6000M-8000MHz_BBHA9120J_mobile-rack_200Vm_hor', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2020_06_19\\6000M-8000MHz_BBHA9120J_mobile-rack_200Vm_ver', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2020_06_19\\80M-200MHz_STLP9128_extender_PRANA_150Vm_hor', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2020_06_19\\80M-200MHz_STLP9128_extender_PRANA_150Vm_ver'], '2021': ['C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2021_07_01\\ref-cal_200M-1000MHz_STLP9128_FLH1000B_200Vm_hor', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2021_07_01\\ref-cal_200M-1000MHz_STLP9128_FLH1000B_200Vm_ver', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2021_07_01\\Ref_Cal_1000M-6000M_200Vm_hor_BBHA9120J', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2021_07_01\\Ref_Cal_1000M-6000M_200Vm_ver_BBHA9120J', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2021_07_01\\ref_cal_20-80M_70Vm_vertical_VHBD9134', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2021_07_01\\ref_cal_6000M-8000MHz_BBHA9120J_mobile-rack_200Vm_hor', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2021_07_01\\ref_cal_6000M-8000MHz_BBHA9120J_mobile-rack_200Vm_ver', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2021_07_01\\ref_cal_80M-200MHz_STLP9128_extender_PRANA_150Vm_hor', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2021_07_01\\ref_cal_80M-200MHz_STLP9128_extender_PRANA_150Vm_ver'], '2022': ['C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2022_05_12\\ref-cal_200M-1000MHz_STLP9128_FLH1000B_200Vm_hor_new_Mount', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2022_05_12\\ref-cal_200M-1000MHz_STLP9128_FLH1000B_200Vm_ver', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2022_05_12\\ref-cal_80M-200MHz_STLP9128_FLH1000B_200Vm_hor_new_Mount', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2022_05_12\\ref-cal_80M-200MHz_STLP9128_FLH1000B_200Vm_ver_new_Mount', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2022_05_12\\Ref_Cal_1000M-6000M_200Vm_Hor_BBHA9120J_new_Mount', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2022_05_12\\Ref_Cal_1000M-6000M_200Vm_Ver_BBHA9120J_new_Mount', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2022_05_12\\ref_cal_20-80M_70Vm_vertical_VHBD9134_new_Mount', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2022_05_12\\Ref_Cal_6000M-8000MHz_HA9251-48_200Vm_hor_new_Mount', 'C:\\Users\\M0182965\\Desktop\\emc2\\Primerjave_referencnih_kalibracij\\2022_05_12\\Ref_Cal_6000M-8000MHz_HA9251-48_200Vm_ver_new_Mount']}
    
def extract_frequency(path: string):
    frequency = re.search(r"20-80M", path)
    return frequency.group(0)[1:15]

for key, group in itertools.groupby(tabela_path, extract_frequency):
    paths_by_year[key] = list(group)

print(tabela_path)

CodePudding user response:

First extract the information you want:

result = []
for key, value in paths_by_year.items():
    for val in value:
        *_, name = val.split('\\')
        m = re.search(r"(?:ref[-_]cal)?(\d M?-\d MH?z?).*(hor|ver)", name, re.I)
        if m:
            freq_range, direction = m.groups()
        result.append((key, freq_range, direction))
        

Now you have a list with tuples of 3 values (key, freq_range, direction) Then you can sort them by defining a key, it sorts first by year, then searches for the first number in the 2nd value, converts it as int and sorts that, and finally the direction.

out = sorted(result, key=lambda x: (x[0], int(x[1].split('-')[0].strip('M')), x[2]))

Output:

[('2020', '20-80M', 'ver'),
 ('2020', '80M-200MHz', 'hor'),
 ('2020', '80M-200MHz', 'ver'),
 ('2020', '200M-1000MHz', 'hor'),
 ('2020', '200M-1000MHz', 'ver'),
 ('2020', '1000M-6000MHz', 'hor'),
 ('2020', '1000M-6000MHz', 'ver'),
 ('2020', '6000M-8000MHz', 'hor'),
 ('2020', '6000M-8000MHz', 'ver'),
 ('2021', '20-80M', 'ver'),
 ('2021', '80M-200MHz', 'hor'),
 ('2021', '80M-200MHz', 'ver'),
 ('2021', '200M-1000MHz', 'hor'),
 ('2021', '200M-1000MHz', 'ver'),
 ('2021', '1000M-6000M', 'hor'),
 ('2021', '1000M-6000M', 'ver'),
 ('2021', '6000M-8000MHz', 'hor'),
 ('2021', '6000M-8000MHz', 'ver'),
 ('2022', '20-80M', 'ver'),
 ('2022', '80M-200MHz', 'hor'),
 ('2022', '80M-200MHz', 'ver'),
 ('2022', '200M-1000MHz', 'hor'),
 ('2022', '200M-1000MHz', 'ver'),
 ('2022', '1000M-6000M', 'Hor'),
 ('2022', '1000M-6000M', 'Ver'),
 ('2022', '6000M-8000MHz', 'hor'),
 ('2022', '6000M-8000MHz', 'ver')]

If you want to join them to one string with __ as delimiter just run this line:

list(map('__'.join, out))

CodePudding user response:

I think you should have explained it better. From what I understood, it is something like this (you can change "numbers" as you want):

from collections import defaultdict
import numpy as np

d = defaultdict(dict)

paths = [
    "2020___20-80M___ver",
    "2020___80M-200MHz___hor",
    "2020___80M-200MHz___ver",
    "2020___200M-1000MHz___hor",
    "2020___200M-1000MHz___ver",
    "2020___1000M-6000MHz___hor",
    "2020___1000M-6000MHz___ver",
    "2020___6000M-8000MHz___hor",
    "2020___6000M-8000MHz___ver"]

numbers = np.random.randint(0, 10, size=len(paths)).tolist()

for path, number in zip(paths, numbers):
    groups = path.split('___')
    year, frequency, orientation = groups
    freq = {}
    freq[orientation] = number

    d[year][frequency] = freq

print(dict(d))
  • Related