Working with Two Different Input Files --- example: Hourly Data and Daily Data (with different lengt-CodePudding

I'm working on some code to manipulate hourly and daily data for a year and am a little confused about how to combine data from the two files. What I am doing is using the hourly pattern of Data Set B but scaling it using Daily Set A. ... so in essence (using the example below) I will take the daily average (Data Set A) of 93 cfs and multiple it by 24 hrs in a day which would equal 2232 . I'll then sum the hourly cfs values for all 24hrs of each day (Data Set B)... which in this case for 1/1/2021 would equal 2596. Normally manipulating a rate in these manners doesn't make sense but in this case it doesn't matter because the units cancel out. I'd then need to take these values and divide them by each other 2232/2596 = 0.8597 and apply that to the hourly cfs values for all 24hrs of each day (Data Set B) for a new "scaled" dataset (to be Data Set C).

My problem is that I have never coded in Python using two different input datasets (I am a complete newbie). I started experimenting with the code but the problem is - is I can't seem to integrate the two datasets. If anyone can point me in the direction of how to integrate two separate input files I'd be most appreciative. Beneath the datasets is my attempts at the code (please note the reverse order of code - working first with hourly data (Data Set B) and then the daily data (Data Set A). My print out of the final scaling factor (SF) is only giving me one print out... not all 8,760 because I'm not in the loop... but how can I be in the loop of both input files at the same time???

Data Set A (Daily) -- 365 lines of data:

1/1/2021 93 cfs
1/2/2021 0 cfs
1/3/2021 70 cfs
1/4/2021 70 cfs

Data Set B (Hourly) -- 8,760 lines of data:

1/1/2021 0:00 150 cfs
1/1/2021 1:00 0 cfs
1/1/2021 2:00 255 cfs
(where summation of all 24 hrs of 1/1/2021 = 2596 cfs) etc.

Sorry if this is a ridiculously easy question... I am very new to coding.

Here is the code that I've written so far... what I need is 8,760 lines of SF... that I can then use to multiple by the original Data Set B. The final product of Data Set C will be Date - Time - rescaled hourly data. I actually have to do this for three pumping units total... to give me a matrix of 5 columns by 8,760 rows but I think I'll be able to figure the unit thing out. My problem now is how to integrate the two data sets. Thank you for reading!

print('Solving the Temperature Model programming problem')
fhand1 = open('Interpolate_CY21_short.txt')
fhand2 = open('WSE_Daily_CY21_short.txt')

#Hourly Interpolated Pardee PowerHouse Data
for line1 in fhand1:
    line1 = line1.rstrip()
    words1 = line1.split()
    #Hourly interpolated data - parsed down (cfs)
    x = float(words1[7])
    if x<100:
        x = 0
    #print(x)

#WSE Daily Average PowerHouse Data
for line2 in fhand2:
    line2 = line2.rstrip()
    words2 = line2.split()
    #Daily cfs average x 24 hrs
    aa = float(words2[2])*24
    #print(a)

SF = x * aa
print(SF)

CodePudding user response：

You could probably use a double nested for loop

daily_average = ["1/1/2021 93 cfs","1/2/2021 0 cfs"]
daily = ["1/1/2021 0:00 150 cfs", "1/1/2021 1:00 0 cfs", "1/2/2021 1:00 0 cfs"]


for average_line in daily_average:
    average_line = average_line.rstrip()
    average_date, average_count, average_symbol = average_line.split()

    for daily_line in daily:
        daily_line = daily_line.rstrip()
        date, hour, count, symbol = daily_line.split()
        if average_date == date:
            print(f"date={date}, average_count={average_count} count={count}")

Or a dictionary

# your input data but not as a file
daily_average = ["1/1/2021 93 cfs","1/2/2021 0 cfs"]
daily = ["1/1/2021 0:00 150 cfs", "1/1/2021 1:00 0 cfs", "1/2/2021 1:00 0 cfs"]


# populate data into dictionaries
daily_average_data = dict()
for line in daily_average:
    line = line.rstrip()
    day, count, symbol = line.split()
    daily_average_data[day] = (day, count, symbol)

daily_data = dict()
for line in daily:
    line = line.rstrip()
    day, hour, count, symbol = line.split()
    if day not in daily_data:
        daily_data[day] = list()
    daily_data[day].append((day, hour, count, symbol))

# now you can access daily_average_data and daily_data as
# dictionaries instead of files

# process data
result = list()
for date in daily_data.keys():
    print(date)
    print(daily_average_data[date])
    print(daily_data[date])

If the data items corresponded with one another line by line, you could use https://realpython.com/python-zip-function/

here is an example:

fhand1 = [1,2,3]
fhand2 = [2,3,4]
for data1, data2 in zip(fhand1, fhand1):
    print(f"{data1} {data2}")

CodePudding user response：

Similar to what @oasispolo decribed, the solution is to make a single loop and process both lists in it. I'm personally not fond of the "zip" function. (It's a purely stylistic objection; lots of other people like it and that's fine.)

Here's a solution with syntax that I find more intuitive:

print('Solving the Temperature Model programming problem')
fhand1 = open('Interpolate_CY21_short.txt', 'r')
fhand2 = open('WSE_Daily_CY21_short.txt', 'r')

# Convert each file into a list of lines. You're doing this
# implicitly, but I like to be explicit about it.
lines1 = fhand1.readlines()
lines2 = fhand2.readlines()

if len(lines1) != len(lines2):
    raise ValueError("The two files have different length!")

# Initialize an output array. You cold also construct it
# one item at a time, but that can be slow for large arrays.
# It is more efficient to initialize the entire array at 
# once if possible.
sf_list = [0]*len(lines1)

for position in range(len(lines1)):
    # range(L) generates numbers 0...L-1
    line1 = lines1[position].rstrip()
    words1 = line1.split()
    x = float(words1[7])
    if x<100:
        x = 0

    line2 = lines2[position].rstrip()
    words2 = line2.split()
    aa = float(words2[2])*24

    sf_list[position] = x * aa

print(sf_list)