Home > OS >  Two conditions with a single expression (both options) with regex python
Two conditions with a single expression (both options) with regex python

Time:03-28

I wrote an expression that extracts all Dec Weekday or Weekend followed by numeric values: Here is the expression I created:

    consumption_we = re.compile(r'([a-zA-Z]{3} Weekend \d{4}-\d{4}) ([\d,] \.\d{3}) (\d\.\d{4}) (\$[\d,] \.\d{2})')
    consumption_wd = re.compile(r'([a-zA-Z]{3} Weekday \d{4}-\d{4}) ([\d,] \.\d{3}) (\d\.\d{4}) (\$[\d,] \.\d{2})')

It works but I don't want to replicate the same expression for Weekend and Weekday. I was wondering if I could use a single expression that could meet both conditions. I use Weekend | Weekday but still gives me only one of the options most likely the second option.

My raw data looks like below:

Dec Weekday 0000-0800 2,242.144    7.4600     $167.26
Dec Weekday 0800-2400 14,178.264   10.8500    $1,538.34
Dec Weekend 0000-0800 785.168      6.2400     $48.99
Dec Weekend 0800-2400 4,972.248    7.9300     $394.30 
Dec Weekday 0000-0800 121.300      7.4600     $9.05
Dec Weekday 0800-2400 767.045      10.8500    $83.22
Dec Weekend 0000-0800 42.478       6.2400     $2.65
Dec Weekend 0800-2400 268.999      7.9300     $21.33

Any help would be appreciated, please.

CodePudding user response:

From the expression that you shared, you are trying to get weekdays and weekends separately, but if you use (weekend|weekdays) as some have suggested above, you wont get separate data.

So to get that, you can do this: `

import re

s = """
Dec Weekday 0000-0800 2,242.144    7.4600     $167.26
Dec Weekday 0800-2400 14,178.264   10.8500    $1,538.34
Dec Weekend 0000-0800 785.168      6.2400     $48.99
Dec Weekend 0800-2400 4,972.248    7.9300     $394.30 
Dec Weekday 0000-0800 121.300      7.4600     $9.05
Dec Weekday 0800-2400 767.045      10.8500    $83.22
Dec Weekend 0000-0800 42.478       6.2400     $2.65
Dec Weekend 0800-2400 268.999      7.9300     $21.33
"""
days = ["Weekday", "Weekend"]

for day in days:
    pattern = r'([a-zA-Z]{3} '   day  ' \d{4}-\d{4}) ([\d,] \.\d{3})\s (\d \.\d{4})\s (\$[\d,] \.\d{2})'
    print(f"Day: {day}", re. findall(pattern, s))
"""
Day: Weekday [('Dec Weekday 0000-0800', '2,242.144', '7.4600', '$167.26'), ('Dec Weekday 0800-2400', '14,178.264', '10.8500', '$1,538.34'), ('Dec Weekday 0000-0800', '121.300', '7.4600', '$9.05'), ('Dec Weekday 0800-2400', '767.045', '10.8500', '$83.22')]
Day: Weekend [('Dec Weekend 0000-0800', '785.168', '6.2400', '$48.99'), ('Dec Weekend 0800-2400', '4,972.248', '7.9300', '$394.30'), ('Dec Weekend 0000-0800', '42.478', '6.2400', '$2.65'), ('Dec Weekend 0800-2400', '268.999', '7.9300', '$21.33')]
"""

CodePudding user response:

first of all, I don't think your current regex is working, as you didn't match the spaces between columns 4 and 5. Here is the screenshot from enter image description here

  • Related