Home > Software design >  Allowing month formatting variation when reading files
Allowing month formatting variation when reading files

Time:09-17

Every month I run a script that does a bunch of calculations based on input from Excel sheets. However, the uploaders are inconsistent with how they spell out the month. July/Jul, Sep/Sept/September, etc. So my current solution is prone to errors:

excel = pd.read_excel(f'{month} {year} Monthly Statement.xlsx') 

The input provided by the user in the Python app is numeric, and I was thinking that I could convert this in a dictionary with the different variations.

##user input
while True:
    month = int(8) #user input
    if month not in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12):
        print("That is not a valid month. Remember to type it as a number. January = 1 for example")
    else:
        print("Thank you for giving me the month")
        break

##month translation dict
month_dict = {1: ["Jan", "January"],
             2: ["Feb", "February"],
             8: ["Aug", "August"],
             9: ["Sep", "Sept", "September"]}

What I have tried:

##it should loop over the various way to type out the month and go with the version that does not throw an error:
excel = pd.read_excel(f'{month_dict[month]} {year} Monthly Statement.xlsx')
##but does not work since I am not sure how to loop over the different items in the dict

How can I do this?

Edit: File names are:

January 2021 Monthly Statement.xlsx
Aug 2021 Monthly Statement.xlsx
September 2021 Monthly Statement.xlsx

CodePudding user response:

If the month name may just be shortened, but we can safely assume that it is spelled correctly and that the filename is like your examples, then this should work:

month_dict = {1: "Jan",
             2: "Feb",
             8: "Aug",
             9: "Sep"}

myfile = glob.glob(f'{month_dict[month]}* {year} Monthly Statement.xlsx')
excel = pd.read_excel(myfile[0])

In detail, month-dict[month] when month is 3 gives us "Mar", so we're looking for a file named "Mar* 2021 Monthly Statement.xlsx". This will match "Mar", "Marc", "March", and unfortunately even "Marchxx", but not "Ma" (and this is a good thing because we wouldn't know whether to go for March or May) nor "Mrch" (and this is not so good...)

CodePudding user response:

This is how I ended up solving it per now, but it not as lean as the response from @gimix

##user input
while True:
    month = int(8) #user input
    if month not in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12):
        print("That is not a valid month. Remember to type it as a number. January = 1 for example")
    else:
        print("Thank you for giving me the month")
        break

##dict
month_dict = {1: ["Jan", "January"],
             2: ["Feb", "February"],
             3: ["Mar", "March"],
             4: ["Apr", "April"],
             5: ["May"],
             6: ["Jun", "June"],
             7: ["Jul", "July"],
             8: ["Aug", "August"],
             9: ["Sep", "Sept", "September"],
             10: ["Oct", "October"],
             11: ["Nov", "November"],
             12: ["Dec", "December"]}

##loop
for i in month_dict:
    try:
        ex = pd.read_excel(f'{month_dict[month][i]} 2021 Monthly Statement.xlsx')
        break
    except IOError:
        print("error")
  • Related