Home > database >  Adding two dataframes in pandas with different columns
Adding two dataframes in pandas with different columns

Time:09-21

I would like to add the two dataframes together as column 1 is added to column 1 (as in matrix summation based on i, j), column 2 is added to column 2 in case that the column does not exist in one of the dataframes, they should still appended from one of the dataframes.

The output should be a dataframe consisting an shown index of: ['Sun', 'Wind', 'Water', 'Flow'] then the dataframe should be ranging from 1:22.

All values are currently 0, but if column "2", cell 3 in dt1 is 200, then this cell is added to column "2" cell 3 in dt2 which is 10 for the total of 210.

import pandas as pd 
cols = range(1, 20)
idx = ['Sun', 'Wind', 'Water', 'Flow']
rows = [0] * int(len(cols))
rows = [rows]

dt1 = pd.DataFrame(rows, index=idx, columns=cols)
dt1 = dt1.reset_index()

cols = range(3, 22)
idx = ['Sun', 'Wind', 'Water', 'Flow']
rows = [0] * int(len(cols))
rows = [rows]

dt2 = pd.DataFrame(rows, index=idx, columns=cols)
dt2 = dt2.reset_index()


TRIED: 
df = dt1[dt1.columns[1:]].add(dt2[dt2.columns[1:]]).fillna(0)

It may be that matrix addition is the way forward with two for loops, however, I'm not quite sure how to handle the comparison of appending the right values in the right columns.

CodePudding user response:

I think you could reindex both df:s like this

dt1 = dt1.reindex(range(1,22))

dt2 = dt2.reindex(range(1,22))

dt3 = dt1 dt2

CodePudding user response:

If your columns and rows are aligned between the two dataframes:

>>> dt1.iloc[:, 1:].add(dt2.iloc[:, 1:].values)

Or don't reset_index:

>>> dt1   dt2

CodePudding user response:

Here's my solution as I understand your question.

cols = range(1, 20)
idx = ['Sun', 'Wind', 'Water', 'Flow']
rows = [0] * int(len(cols))
rows = [rows]

dt1 = pd.DataFrame(rows, index=idx, columns=cols)

cols = range(3, 22)
idx = ['Sun', 'Wind', 'Water', 'Flow']
rows = [0] * int(len(cols))
rows = [rows]

dt2 = pd.DataFrame(rows, index=idx, columns=cols)

# Add dataframes based on matching columns and index
dt3 = dt1   dt2

# Fill each column that doesn't overlap with values of other columns
for col in dt3:
    if col in dt1.columns:
        dt3[col].fillna(dt1[col],inplace = True)
    if col in dt2.columns:
        dt3[col].fillna(dt2[col],inplace = True)
        
# NaN is a float type, so convert whole df back to integers        
dt3 = dt3.astype(int)

CodePudding user response:

Using difference and intersection, you could add the unknown columns from dt2 into dt1 and then sum those columns in common. The assumption here is that you want row-wise addition (that is, each dataset has rows in common), so reset_index is not needed.

import pandas as pd
cols = range(1, 20)
idx = ['Sun', 'Wind', 'Water', 'Flow']
rows = [0] * int(len(cols))
rows = [rows]

dt1 = pd.DataFrame(rows, index=idx, columns=cols)

cols = range(3, 22)
idx = ['Sun', 'Wind', 'Water', 'Flow']
rows = [0] * int(len(cols))
rows = [rows]

dt2 = pd.DataFrame(rows, index=idx, columns=cols)

# Insert new columns from dt2 into dt1 then add common columns
common_columns = dt1.columns.intersection(dt2.columns)
new_columns = dt2.columns.difference(dt1.columns)
dt1[new_columns] = dt2[new_columns]
dt1[common_columns]  = dt2[common_columns]
del dt2

CodePudding user response:

You can get the union of columns by Index.union(), reindex by .reindex() with fill value 0. Then .add() the 2 dataframes and .reset_index(), as follows:

dt1a = dt1.set_index('index')
dt2a = dt2.set_index('index')
all_cols = dt1a.columns.union(dt2a.columns)

dt1b = dt1a.reindex(all_cols, axis=1, fill_value=0)
dt2b = dt2a.reindex(all_cols, axis=1, fill_value=0)

df_out = dt1b.add(dt2b).reset_index()

Data Input

dt1.at[2, 3] = 200

print(dt1)

   index  1  2    3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19
0    Sun  0  0    0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0
1   Wind  0  0    0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0
2  Water  0  0  200  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0
3   Flow  0  0    0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0

dt2.at[2, 3] = 10

print(dt2)

   index   3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21
0    Sun   0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0
1   Wind   0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0
2  Water  10  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0
3   Flow   0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0

Output

print(df_out)


   index  1  2    3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21
0    Sun  0  0    0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0
1   Wind  0  0    0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0
2  Water  0  0  210  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0
3   Flow  0  0    0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0
  • Related