I am taking a course in python and one of the problem sets is as follows:
Read in the contents of the file SP500.txt which has monthly data for 2016 and 2017 about the S&P 500 closing prices as well as some other financial indicators, including the “Long Term Interest Rate”, which is interest rate paid on 10-year U.S. government bonds.
Write a program that computes the average closing price (the second column, labeled SP500) and the highest long-term interest rate. Both should be computed only for the period from June 2016 through May 2017. Save the results in the variables mean_SP and max_interest.
SP500.txt:
Date,SP500,Dividend,Earnings,Consumer Price Index,Long Interest Rate,Real Price,Real Dividend,Real Earnings,PE10 1/1/2016,1918.6,43.55,86.5,236.92,2.09,2023.23,45.93,91.22,24.21 2/1/2016,1904.42,43.72,86.47,237.11,1.78,2006.62,46.06,91.11,24 3/1/2016,2021.95,43.88,86.44,238.13,1.89,2121.32,46.04,90.69,25.37 4/1/2016,2075.54,44.07,86.6,239.26,1.81,2167.27,46.02,90.43,25.92 5/1/2016,2065.55,44.27,86.76,240.23,1.81,2148.15,46.04,90.23,25.69 6/1/2016,2083.89,44.46,86.92,241.02,1.64,2160.13,46.09,90.1,25.84 7/1/2016,2148.9,44.65,87.64,240.63,1.5,2231.13,46.36,91,26.69 8/1/2016,2170.95,44.84,88.37,240.85,1.56,2251.95,46.51,91.66,26.95 9/1/2016,2157.69,45.03,89.09,241.43,1.63,2232.83,46.6,92.19,26.73 10/1/2016,2143.02,45.25,90.91,241.73,1.76,2214.89,46.77,93.96,26.53 11/1/2016,2164.99,45.48,92.73,241.35,2.14,2241.08,47.07,95.99,26.85 12/1/2016,2246.63,45.7,94.55,241.43,2.49,2324.83,47.29,97.84,27.87 1/1/2017,2275.12,45.93,96.46,242.84,2.43,2340.67,47.25,99.24,28.06 2/1/2017,2329.91,46.15,98.38,243.6,2.42,2389.52,47.33,100.89,28.66 3/1/2017,2366.82,46.38,100.29,243.8,2.48,2425.4,47.53,102.77,29.09 4/1/2017,2359.31,46.66,101.53,244.52,2.3,2410.56,47.67,103.74,28.9 5/1/2017,2395.35,46.94,102.78,244.73,2.3,2445.29,47.92,104.92,29.31 6/1/2017,2433.99,47.22,104.02,244.96,2.19,2482.48,48.16,106.09,29.75 7/1/2017,2454.1,47.54,105.04,244.79,2.32,2504.72,48.52,107.21,30 8/1/2017,2456.22,47.85,106.06,245.52,2.21,2499.4,48.69,107.92,29.91 9/1/2017,2492.84,48.17,107.08,246.82,2.2,2523.31,48.76,108.39,30.17 10/1/2017,2557,48.42,108.01,246.66,2.36,2589.89,49.05,109.4,30.92 11/1/2017,2593.61,48.68,108.95,246.67,2.35,2626.9,49.3,110.35,31.3 12/1/2017,2664.34,48.93,109.88,246.52,2.4,2700.13,49.59,111.36,32.09
My solution (correct but not optimal):
file = open("SP500.txt", "r")
content = file.readlines()
# List that will hold the range of months we need
data=[]
for line in content:
# Get a list of values for each line
values = line.split(',')
# Return lines with the required dates
for i in range(6,13):
month_range = f"{i}/1/2016"
if month_range == values[0]:
data.append(values)
# Return lines with the required dates
for i in range(1,6):
month_range = f"{i}/1/2017"
if month_range == values[0]:
data.append(values)
sum_total = 0
max_interest = 0
# Loop through the data of our required months
for entry in data:
# Get the sum total
sum_price = float(entry[1])
# Find the highest interest rate in list
if max_interest < float(entry[5]):
max_interest = float(entry[5])
mean_SP = sum_total / len(data)
I'm self-learning these concepts and I would love to learn a better way of implementing this solution. My code seems borderline hard coding (exact date in values[0]) and I imagine it to be error prone for bigger problems. Especially the excessive looping that's being done, which seems quite exaustive for such a simple problem. Thanks in advance.
CodePudding user response:
I think you can optimized by storing the index of columns in some variable
with open('temp.txt') as f:
lines = f.readlines()
lines = [line.rstrip().split(",") for line in lines]
date_index, spf_index, long_interest_rate = 0, 1, 5
start_date, end_date = "01/06/2016", "31/05/2017"
mean_SP, max_interest = 0, -1000 # Some random negative number
for line in lines[1:]:
if start_date <= line[date_index] <= end_date:
mean_SP = float(line[spf_index])
max_interest = max(max_interest, float(line[long_interest_rate]))
mean_SP /= len(lines[1:])
print(mean_SP, max_interest)
CodePudding user response:
You can use pandas in order to help you select specific rows/columns of the table :
import pandas as pd
from datetime import datetime
# to read the txt file (if it does not work it's maybe that your txt file is in one line so check that)
df = pd.read_csv("SP500.txt", sep=",")
df['Date'] = df['Date'].apply(lambda x:datetime.strptime(x, '%d/%m/%Y').date())
df.head()
Date SP500 Dividend Earnings Consumer Price Index \
0 2016-01-01 1918.60 43.55 86.50 236.92
1 2016-01-02 1904.42 43.72 86.47 237.11
2 2016-01-03 2021.95 43.88 86.44 238.13
3 2016-01-04 2075.54 44.07 86.60 239.26
4 2016-01-05 2065.55 44.27 86.76 240.23
Long Interest Rate Real Price Real Dividend Real Earnings PE10
0 2.09 2023.23 45.93 91.22 24.21
1 1.78 2006.62 46.06 91.11 24.00
2 1.89 2121.32 46.04 90.69 25.37
3 1.81 2167.27 46.02 90.43 25.92
4 1.81 2148.15 46.04 90.23 25.69
# for example to select the rows with a date later than 01-10-2016
df_ex = df[df['Date'] > datetime.strptime('01-10-2016', '%d-%m-%Y').date()]
df_ex.head()
Date SP500 Dividend Earnings Consumer Price Index \
12 2017-01-01 2275.12 45.93 96.46 242.84
13 2017-01-02 2329.91 46.15 98.38 243.60
14 2017-01-03 2366.82 46.38 100.29 243.80
15 2017-01-04 2359.31 46.66 101.53 244.52
16 2017-01-05 2395.35 46.94 102.78 244.73
Long Interest Rate Real Price Real Dividend Real Earnings PE10
12 2.43 2340.67 47.25 99.24 28.06
13 2.42 2389.52 47.33 100.89 28.66
14 2.48 2425.40 47.53 102.77 29.09
15 2.30 2410.56 47.67 103.74 28.90
16 2.30 2445.29 47.92 104.92 29.31
Then you can perform the calculations that you want by selecting the appropriate columns/rows this way.