I wrote an expression that extracts all Dec Weekday or Weekend followed by numeric values: Here is the expression I created:
consumption_we = re.compile(r'([a-zA-Z]{3} Weekend \d{4}-\d{4}) ([\d,] \.\d{3}) (\d\.\d{4}) (\$[\d,] \.\d{2})')
consumption_wd = re.compile(r'([a-zA-Z]{3} Weekday \d{4}-\d{4}) ([\d,] \.\d{3}) (\d\.\d{4}) (\$[\d,] \.\d{2})')
It works but I don't want to replicate the same expression for Weekend
and Weekday
. I was wondering if I could use a single expression that could meet both conditions. I use Weekend | Weekday
but still gives me only one of the options most likely the second option.
My raw data looks like below:
Dec Weekday 0000-0800 2,242.144 7.4600 $167.26
Dec Weekday 0800-2400 14,178.264 10.8500 $1,538.34
Dec Weekend 0000-0800 785.168 6.2400 $48.99
Dec Weekend 0800-2400 4,972.248 7.9300 $394.30
Dec Weekday 0000-0800 121.300 7.4600 $9.05
Dec Weekday 0800-2400 767.045 10.8500 $83.22
Dec Weekend 0000-0800 42.478 6.2400 $2.65
Dec Weekend 0800-2400 268.999 7.9300 $21.33
Any help would be appreciated, please.
CodePudding user response:
From the expression that you shared, you are trying to get weekdays and weekends separately, but if you use (weekend|weekdays)
as some have suggested above, you wont get separate data.
So to get that, you can do this: `
import re
s = """
Dec Weekday 0000-0800 2,242.144 7.4600 $167.26
Dec Weekday 0800-2400 14,178.264 10.8500 $1,538.34
Dec Weekend 0000-0800 785.168 6.2400 $48.99
Dec Weekend 0800-2400 4,972.248 7.9300 $394.30
Dec Weekday 0000-0800 121.300 7.4600 $9.05
Dec Weekday 0800-2400 767.045 10.8500 $83.22
Dec Weekend 0000-0800 42.478 6.2400 $2.65
Dec Weekend 0800-2400 268.999 7.9300 $21.33
"""
days = ["Weekday", "Weekend"]
for day in days:
pattern = r'([a-zA-Z]{3} ' day ' \d{4}-\d{4}) ([\d,] \.\d{3})\s (\d \.\d{4})\s (\$[\d,] \.\d{2})'
print(f"Day: {day}", re. findall(pattern, s))
"""
Day: Weekday [('Dec Weekday 0000-0800', '2,242.144', '7.4600', '$167.26'), ('Dec Weekday 0800-2400', '14,178.264', '10.8500', '$1,538.34'), ('Dec Weekday 0000-0800', '121.300', '7.4600', '$9.05'), ('Dec Weekday 0800-2400', '767.045', '10.8500', '$83.22')]
Day: Weekend [('Dec Weekend 0000-0800', '785.168', '6.2400', '$48.99'), ('Dec Weekend 0800-2400', '4,972.248', '7.9300', '$394.30'), ('Dec Weekend 0000-0800', '42.478', '6.2400', '$2.65'), ('Dec Weekend 0800-2400', '268.999', '7.9300', '$21.33')]
"""
CodePudding user response:
first of all, I don't think your current regex is working, as you didn't match the spaces between columns 4 and 5. Here is the screenshot from