Home > other >  Take single column from multiple CSV files and place them as new columns in dataframe
Take single column from multiple CSV files and place them as new columns in dataframe

Time:04-28

I have multiple CSV files with the same column headers that look like this:

|      Date & Time      |  Rain |    Flow    |
| --------------------- | ----- | ---------- |
|    3/19/2018 12:00    |   0   |    0.51    |
|    3/19/2018 13:00    |   2   |    0.51    |
...

I want to take the 'Flow' column from each CSV and place them side by side according to the date. The issue I am facing is that the Date & Time for each CSV is different and I want to align the columns according to date and if there was no value for that date when I merge, I want to leave an empty space or NaN

I created a new dataframe that has a range of dates that encapsulates all the dates found in the list of CSVs, but I am unable to merge the columns accordingly.

The final dataframe would look something like

|      Date & Time      |    CSV 1 Flow    |    CSV 2 Flow    |    CSV 3 Flow    |
| --------------------- | ---------------- | ---------------- | ---------------- |
|    3/19/2018 12:00    |       0.51       |        NaN       |       0.34       |
|    3/19/2018 13:00    |       0.51       |        NaN       |       0.47       |
...

What I tried so far looks like:

csv_files = glob.glob(os.path.join(pwd, "*.csv"))
range = pd.date_range('2017-01-01', periods=45985, freq='H')
df_full = pd.DataFrame({'Date & Time': range})

for j in csv_files:
   df_full[j]=''
   df_hourly = pd.read_csv(j, usecols=['Date & Time','Flow'])
   df_merged = pd.merge(df_full, df_hourly, on='Date & Time', how='left')

I have gotten the code to look like:

range = pd.date_range('2017-01-01', periods=45985, freq='H')
df_full = pd.DataFrame({'Date & Time': range})
for filename in csv_files:
  df_full[filename] = ''
  df = pd.read_csv(filename,header=0, parse_dates=['Date & Time'], 
  usecols=['Date & Time', 'Flow'])
  df_combined = pd.merge(left=df_full,right=df, on='Date & Time', how='outer')
df_combined

Which gives an output DF that looks like

|      Date & Time      |   CSV 1 Filepath |   CSV 2 Filepath |...    | - Flow- |
| --------------------- | ---------------- | ---------------- |...    | ------- |
|    01/01/2017 00:00   |      BLANK       |      BLANK       |...    |   0.34  |
|    01/01/2017 01:00   |      BLANK       |      BLANK       |...    |   0.25  |
...

The entire table is blank except for the last column which is labeled 'Flow'. It seems that the script is not putting the values in the correct column.

CodePudding user response:

Try something like this:

df1 = pd.read_csv('example.csv', parse_dates=['Date & Time'])
df2 = pd.read_csv('example.csv', parse_dates=['Date & Time'])
df_all = df1.merge(df2, on='Date & Time', how='left')

print(df_all)

Output:

          Date & Time  Rain_x  Flow_x  Rain_y  Flow_y
0 2018-03-19 12:00:00       0    0.51       0    0.51
1 2018-03-19 13:00:00       2    0.51       2    0.51

Approximately your loop will be something like this:

csv_files = glob.glob(os.path.join(pwd, "*.csv"))

df_all = pd.read_csv(csv_files[0], parse_dates=['Date & Time'], usecols=['Date & Time','Flow'])

for file in csv_files[1:]:
    df = pd.read_csv(file, parse_dates=['Date & Time'], usecols=['Date & Time','Flow'])
    df_all = df_all.merge(df, on='Date & Time', how='left')
  • Related