.split() does not split all the words in a line-CodePudding

Here is the exercise: "Write a program that categorizes each mail message by which day of the week the commit was done. To do this look for lines that start with “From”, then look for the third word and keep a running count of each of the days of the week. At the end of the program print out the contents of your dictionary (order does not matter). Sample Line: From [email protected] Sat Jan 5 09:14:16 2008" I need these kinds of lines be splited from my file but in output just zero and first positions (From and the email address) are shown not other words. The error in output is: "list index is out of rage". Here is my code:

file1= open('short.txt')
counts= dict()
for line in file1:
    line = line.rstrip()
    if not line.startswith('From: '): continue
    splited = line.split()
    print(splited)
    day= splited[2]
    counts[day]= counts.get(day,0) 1
print('Count of days:',counts)

CodePudding user response：

Please make your self familiar with the mailbox file format: https://en.wikipedia.org/wiki/Mbox , your input file is probably some variant of it.

The "From" line you want to parse contains "From " without a semicolon and it is at the beginning of a saved message.

The "From: " line (with a semicolon) is part of the message headers and has a different syntax - it contains an address, not a date. The headers are separated from the message body by a blank linke. You should move on to the next message at the blank line.

CodePudding user response：

You are searching for the lines starting with From: while the line which you want starts with From , I think below code should work for you:

counts = {}
with open('short.txt') as file1:
    for line in file1:
        line = line.strip()
        if line.startswith('From '):
            day = line.split()[2]
            counts[day] = counts.get(day ,0)   1    
print (counts)

Output:

{'Sat': 1, 'Fri': 20, 'Thu': 6}