Home > Enterprise >  Assemble a dataframe from two csv files
Assemble a dataframe from two csv files

Time:04-26

I wrote the following code to form a data frame containing the energy consumption and the temperature. The data for each of the variables is collected from a different csv file:

def match_data():
pwr_data = pd.read_csv(r'C:\\Users\X\Energy consumption per hour-data-2022-03-16 17_50_56_Edited.csv')
temp_data = pd.read_csv(r'C:\\Users\X\temp.csv')
new_time = []
new_pwr = []
new_tmp = []
for i in range(1,len(pwr_data)):
    for j in range(1,len(temp_data)):
        if pwr_data['time'][i] == temp_data['Date'][j]:
            time = pwr_data['time'][i]
            pwr = pwr_data['watt_hour'][i]
            tmp = temp_data['Temp'][j]
            new_time.append(time)
            new_pwr.append(pwr)
            new_tmp.append(tmp)


return pd.DataFrame({'Time' : new_time,'watt_hour' : new_pwr,'Temp':new_tmp})     

I was trying to collect data with matching time indices so that I can assemble them in a data frame. The code works well but it takes time(43 seconds for around 1300 data points). At the moment I don't have much data but I was wondering if there was a more efficient and faster way to do so

CodePudding user response:

Do the pwr_data['time'] and temp_data['Date] columns have the same granularity?

If so, you can pd.merge() the two dataframes after reading them.

# read data
pwr_data = pd.read_csv(r'C:\\Users\X\Energy consumption per hour-data-2022-03-16 17_50_56_Edited.csv')
temp_data = pd.read_csv(r'C:\\Users\X\temp.csv')

# merge data on time and Date columns
# you can set the how to be 'inner' or 'right' depending on your needs
df = pd.merge(pwr_data, temp_data, how='left', left_on='time', right_on='Date')

CodePudding user response:

Just like @greco recommended this did the trick and in no time!

pd.merge(pwr_data,temp_data,how='inner',left_on='time',right_on='Date')

'time' and Date are the columns on which you want to base the merge.

  • Related