I've been trying to add a multiples sets of data but can't seem to make it work.
Say, I have three 3x9 dataframes
dfA:
A: Wave height | A: Wave period | A: Wave dir | B: Wave height | B: Wave period | B: Wave dir | C: Wave height | C: Wave period | C: Wave dir | |
---|---|---|---|---|---|---|---|---|---|
1/1/2001 | 1.1 | 5 | 90 | 4.5 | 5 | 90 | 3.1 | 5 | 100 |
2/1/2001 | 1.2 | 4 | 91 | 4.7 | 5 | 98 | 3.0 | 5 | 101 |
3/1/2001 | 1.3 | 6 | 92 | 4.8 | 3 | 97 | 2.5 | 5 | 105 |
dfB:
A: Wave height | A: Wave period | A: Wave dir | B: Wave height | B: Wave period | B: Wave dir | C: Wave height | C: Wave period | C: Wave dir | |
---|---|---|---|---|---|---|---|---|---|
3/1/2001 | 4.1 | 5 | 290 | 4.5 | 5 | 290 | 3.1 | 5 | 300 |
4/1/2001 | 4.2 | 4 | 291 | 4.7 | 5 | 298 | 3.0 | 5 | 301 |
5/1/2001 | 4.3 | 6 | 292 | 4.8 | 3 | 297 | 2.5 | 5 | 305 |
dfC:
A: Wave height | A: Wave period | A: Wave dir | B: Wave height | B: Wave period | B: Wave dir | C: Wave height | C: Wave period | C: Wave dir | |
---|---|---|---|---|---|---|---|---|---|
2/1/2001 | 1.1 | 5 | 190 | 5.5 | 5 | 290 | 6.1 | 5 | 104 |
3/1/2001 | 1.2 | 4 | 191 | 5.7 | 5 | 298 | 6.0 | 5 | 106 |
4/1/2001 | 1.3 | 6 | 192 | 5.8 | 3 | 297 | 6.5 | 5 | 103 |
You can see I have wave parameters at 3 locations: A, B, and C. I need to combine the wave parameters at A from dfA,B from dfB, then C from dfC.
So the result should be a 9x3 dataframe with this values:
Wave height | Wave period | Wave dir | |
---|---|---|---|
1/1/2001 | 1.1 | 5 | 190 |
2/1/2001 | 1.2 | 4 | 191 |
3/1/2001 | 1.3 | 6 | 192 |
3/1/2001 | 4.5 | 5 | 290 |
4/1/2001 | 4.7 | 5 | 298 |
5/1/2001 | 4.8 | 3 | 297 |
2/1/2001 | 6.1 | 5 | 104 |
3/1/2001 | 6.0 | 5 | 106 |
4/1/2001 | 6.5 | 5 | 103 |
As you can see, some index will be duplicated so the results will have to be reindexed to 1/1/2001 to 9/1/2001 like this:
Wave height | Wave period | Wave dir | |
---|---|---|---|
1/1/2001 | 1.1 | 5 | 190 |
2/1/2001 | 1.2 | 4 | 191 |
3/1/2001 | 1.3 | 6 | 192 |
4/1/2001 | 4.5 | 5 | 290 |
5/1/2001 | 4.7 | 5 | 298 |
6/1/2001 | 4.8 | 3 | 297 |
7/1/2001 | 6.1 | 5 | 104 |
8/1/2001 | 6.0 | 5 | 106 |
9/1/2001 | 6.5 | 5 | 103 |
I first tried plain concatenating but this encountered duplicate index error. code:
pts = ('A','B','C')
for pt in pts:
df = 'df{}.format(pt)'
new_df = pd.DataFrame(columns=("Wave height","Wave period","Wave dir"))
new_df["Wave Height"] = pd.concat([new_df["Wave height"], df['{}: Wave height'.format(pt)]])
new_df["Wave period"] = pd.concat([new_df["Wave period"], df['{}: Wave period'.format(pt)]])
new_df["Wave dir"] = pd.concat([new_df["Wave dir"], df['{}: Wave dir(pt)]])
I tried concatenating with ignore_index=True but then it only concatenates 1 column and replaces all with N/A.
I trying merge but it seems that it needs certain columns to match which is just as problematic.
CodePudding user response:
You can't have duplicated index values using pd.concat
But you can try with merge and parameter 'outer' :
pd.merge(dfA, dfB, on="name_of_colum_index", how="outer")
Set index name before.
CodePudding user response:
You should be able to use concat
:
df = pd.concat([dfA, dfB, dfC])
idx = pd.MultiIndex.from_arrays(zip(*df.columns.str.split(': ')))
df = df.set_axis(idx, axis=1)
for name, d in df.groupby(level=0, axis=1):
print(f'level: {name}')
print(d.droplevel(level=0, axis=1))
output:
level: A
Wave height Wave period Wave dir
1/1/2001 1.1 5 90
2/1/2001 1.2 4 91
3/1/2001 1.3 6 92
3/1/2001 4.1 5 290
4/1/2001 4.2 4 291
5/1/2001 4.3 6 292
2/1/2001 1.1 5 190
3/1/2001 1.2 4 191
4/1/2001 1.3 6 192
level: B
Wave height Wave period Wave dir
1/1/2001 4.5 5 90
2/1/2001 4.7 5 98
3/1/2001 4.8 3 97
3/1/2001 4.5 5 290
4/1/2001 4.7 5 298
5/1/2001 4.8 3 297
2/1/2001 5.5 5 290
3/1/2001 5.7 5 298
4/1/2001 5.8 3 297
level: C
Wave height Wave period Wave dir
1/1/2001 3.1 5 100
2/1/2001 3.0 5 101
3/1/2001 2.5 5 105
3/1/2001 3.1 5 300
4/1/2001 3.0 5 301
5/1/2001 2.5 5 305
2/1/2001 6.1 5 104
3/1/2001 6.0 5 106
4/1/2001 6.5 5 103