I have 2 data frames that have been scraped from online and I need to combine them together into one data frame to export to excel. But I'm running into formatting issues and need to see if someone can help me solve this, please.
Dataframe 1=
df1= pd.DataFrame(table_contents)
df1= df1.replace(r'\n','',regex=True)
print(df1)
results:
0 1 2
0 Order Number Manager Order Date
1 Z57-808456-9 Victor Tully 01/13/2022
Dataframe2=
order_list.append(order_info)
df2 = pd.DataFrame(order_list)
df2.head()
print(df2)
results:
Order Number Location Zip Code
0 Z57-808456-9 Department 28 48911
I've tried using a few different alternatives but still not getting proper results.
combined_dfs= pd.concat([df1,df2],axis=1,join="inner")
print (combined_dfs)
results:
Order Number Location Zip Code 0 1 2
0 Z57-808456-9 Department 28 48911 Order Number Manager Order Date
I was trying to get them all together on 2 rows and possibly remove the duplicate Order Number that shows up on both. If not I can still live with it altogether and a duplicate.
expected results:
Order Number Location Zip Code Manager Order Date
Z57-808456-9 Department 28 48911 Victor Tully 01/13/2022
CodePudding user response:
You can create columns by first row in DataFrame.set_axis
, remove first row by iloc[1:]
and then join with df2
:
df = df1.set_axis(df1.iloc[0], axis=1, inplace=False).iloc[1:]
combined_dfs = df2.merge(df, on='Order Number')
print (combined_dfs)
Order Number Location Zip Code Manager Order Date
0 Z57-808456-9 Department 28 48911 Victor Tully 01/13/2022
CodePudding user response:
It seems in your first data frame you have the column names as the first row. You can drop the first row and rename the columns, then merge the two dataframes.
# remove first row of data
df1 = df1.iloc[1:].reset_index()
# set column names
df1.columns = ['Order Number', 'Location', 'Zip Code']
# merge dataframes on order number
combined_df = pd.merge(df1, df2, on='Order Number', how='inner')
CodePudding user response:
pd.merge(df1, df2, on='Order Number')