I've two tables :
- df1 is empty with columns = {'A', 'B', 'A', 'C', C', 'A'} and 62 rows.
- df2 is not empty with columns = {'A', 'B', 'C', 'D', 'E', 'F'} and 62 rows.
I would like to fill df1 rows with df2 rows based on column name (note that df1 and df2 are just for the example, my tables have more than 3,000 columns).
I'm currently using the following code:
for col in df1:
if col in df2:
df1[col] = df2[col]
So I was wondering, is it possible to do that without looping through columns ? As it would be done with a pd.merge or pd.concat. I think that looping through columns is increasing my running time a lot when df1 has a lot of columns.
I've also tried using merge or update but as I've duplicate column names in df1 it seems not working.
Thanks in advance.
CodePudding user response:
You can reindex
and update
for in place modification:
df1.update(df2.reindex(df1.columns, axis=1))
example:
df1 = pd.DataFrame(columns=['A', 'B', 'A', 'C', 'C', 'A'],
index=range(4))
df2 = pd.DataFrame(np.arange(24).reshape(4,6),
columns=['A', 'B', 'C', 'D', 'E', 'F'])
df1.update(df2.reindex(df1.columns, axis=1))
print(df1)
output:
A B A C C A
0 0 1 0 2 2 0
1 6 7 6 8 8 6
2 12 13 12 14 14 12
3 18 19 18 20 20 18
CodePudding user response:
If all the columns in df2 exist in df1, you better do:
for col in df2.columns:
if (col in df1.columns):
df1[col] = df2[col]
This would be much quicker then your code with nested loops.