Elementary Q in python (extracting data from a .txt file without using modules)-CodePudding

I am taking a course in python and one of the problem sets is as follows:

Read in the contents of the file SP500.txt which has monthly data for 2016 and 2017 about the S&P 500 closing prices as well as some other financial indicators, including the “Long Term Interest Rate”, which is interest rate paid on 10-year U.S. government bonds.

Write a program that computes the average closing price (the second column, labeled SP500) and the highest long-term interest rate. Both should be computed only for the period from June 2016 through May 2017. Save the results in the variables mean_SP and max_interest.

SP500.txt:

Date,SP500,Dividend,Earnings,Consumer Price Index,Long Interest Rate,Real Price,Real Dividend,Real Earnings,PE10 1/1/2016,1918.6,43.55,86.5,236.92,2.09,2023.23,45.93,91.22,24.21 2/1/2016,1904.42,43.72,86.47,237.11,1.78,2006.62,46.06,91.11,24 3/1/2016,2021.95,43.88,86.44,238.13,1.89,2121.32,46.04,90.69,25.37 4/1/2016,2075.54,44.07,86.6,239.26,1.81,2167.27,46.02,90.43,25.92 5/1/2016,2065.55,44.27,86.76,240.23,1.81,2148.15,46.04,90.23,25.69 6/1/2016,2083.89,44.46,86.92,241.02,1.64,2160.13,46.09,90.1,25.84 7/1/2016,2148.9,44.65,87.64,240.63,1.5,2231.13,46.36,91,26.69 8/1/2016,2170.95,44.84,88.37,240.85,1.56,2251.95,46.51,91.66,26.95 9/1/2016,2157.69,45.03,89.09,241.43,1.63,2232.83,46.6,92.19,26.73 10/1/2016,2143.02,45.25,90.91,241.73,1.76,2214.89,46.77,93.96,26.53 11/1/2016,2164.99,45.48,92.73,241.35,2.14,2241.08,47.07,95.99,26.85 12/1/2016,2246.63,45.7,94.55,241.43,2.49,2324.83,47.29,97.84,27.87 1/1/2017,2275.12,45.93,96.46,242.84,2.43,2340.67,47.25,99.24,28.06 2/1/2017,2329.91,46.15,98.38,243.6,2.42,2389.52,47.33,100.89,28.66 3/1/2017,2366.82,46.38,100.29,243.8,2.48,2425.4,47.53,102.77,29.09 4/1/2017,2359.31,46.66,101.53,244.52,2.3,2410.56,47.67,103.74,28.9 5/1/2017,2395.35,46.94,102.78,244.73,2.3,2445.29,47.92,104.92,29.31 6/1/2017,2433.99,47.22,104.02,244.96,2.19,2482.48,48.16,106.09,29.75 7/1/2017,2454.1,47.54,105.04,244.79,2.32,2504.72,48.52,107.21,30 8/1/2017,2456.22,47.85,106.06,245.52,2.21,2499.4,48.69,107.92,29.91 9/1/2017,2492.84,48.17,107.08,246.82,2.2,2523.31,48.76,108.39,30.17 10/1/2017,2557,48.42,108.01,246.66,2.36,2589.89,49.05,109.4,30.92 11/1/2017,2593.61,48.68,108.95,246.67,2.35,2626.9,49.3,110.35,31.3 12/1/2017,2664.34,48.93,109.88,246.52,2.4,2700.13,49.59,111.36,32.09

My solution (correct but not optimal):

file = open("SP500.txt", "r")

content = file.readlines()

# List that will hold the range of months we need
data=[]

for line in content:
    # Get a list of values for each line
    values = line.split(',')
    # Return lines with the required dates
    for i in range(6,13):
        month_range = f"{i}/1/2016"
        if month_range == values[0]:
            data.append(values)
    # Return lines with the required dates
    for i in range(1,6):
        month_range = f"{i}/1/2017"
        if month_range == values[0]:
            data.append(values)

sum_total = 0
max_interest = 0
# Loop through the data of our required months
for entry in data:
    # Get the sum total
    sum_price  = float(entry[1])
    # Find the highest interest rate in list
    if max_interest < float(entry[5]):
        max_interest = float(entry[5])

mean_SP = sum_total / len(data)

I'm self-learning these concepts and I would love to learn a better way of implementing this solution. My code seems borderline hard coding (exact date in values[0]) and I imagine it to be error prone for bigger problems. Especially the excessive looping that's being done, which seems quite exaustive for such a simple problem. Thanks in advance.

CodePudding user response：

I think you can optimized by storing the index of columns in some variable

with open('temp.txt') as f:
    lines = f.readlines()
lines = [line.rstrip().split(",") for line in lines]
date_index, spf_index, long_interest_rate = 0, 1, 5
start_date, end_date = "01/06/2016", "31/05/2017"
mean_SP, max_interest = 0, -1000 # Some random negative number

for line in lines[1:]:
    if start_date <= line[date_index] <= end_date:
        mean_SP  = float(line[spf_index])
        max_interest = max(max_interest, float(line[long_interest_rate]))
mean_SP /= len(lines[1:])
print(mean_SP, max_interest)

CodePudding user response：

You can use pandas in order to help you select specific rows/columns of the table :

import pandas as pd
from datetime import datetime

# to read the txt file (if it does not work it's maybe that your txt file is in one line so check that)
df = pd.read_csv("SP500.txt", sep=",")
df['Date'] = df['Date'].apply(lambda x:datetime.strptime(x, '%d/%m/%Y').date())
df.head()

         Date    SP500  Dividend  Earnings  Consumer Price Index  \
0  2016-01-01  1918.60     43.55     86.50                236.92   
1  2016-01-02  1904.42     43.72     86.47                237.11   
2  2016-01-03  2021.95     43.88     86.44                238.13   
3  2016-01-04  2075.54     44.07     86.60                239.26   
4  2016-01-05  2065.55     44.27     86.76                240.23   

   Long Interest Rate  Real Price  Real Dividend  Real Earnings   PE10  
0                2.09     2023.23          45.93          91.22  24.21  
1                1.78     2006.62          46.06          91.11  24.00  
2                1.89     2121.32          46.04          90.69  25.37  
3                1.81     2167.27          46.02          90.43  25.92  
4                1.81     2148.15          46.04          90.23  25.69  

# for example to select the rows with a date later than 01-10-2016
df_ex = df[df['Date'] > datetime.strptime('01-10-2016', '%d-%m-%Y').date()]
df_ex.head()

          Date    SP500  Dividend  Earnings  Consumer Price Index  \
12  2017-01-01  2275.12     45.93     96.46                242.84   
13  2017-01-02  2329.91     46.15     98.38                243.60   
14  2017-01-03  2366.82     46.38    100.29                243.80   
15  2017-01-04  2359.31     46.66    101.53                244.52   
16  2017-01-05  2395.35     46.94    102.78                244.73   

    Long Interest Rate  Real Price  Real Dividend  Real Earnings   PE10  
12                2.43     2340.67          47.25          99.24  28.06  
13                2.42     2389.52          47.33         100.89  28.66  
14                2.48     2425.40          47.53         102.77  29.09  
15                2.30     2410.56          47.67         103.74  28.90  
16                2.30     2445.29          47.92         104.92  29.31

Then you can perform the calculations that you want by selecting the appropriate columns/rows this way.