Home > Back-end >  Cant combining 2 dataframes without formatting issues?
Cant combining 2 dataframes without formatting issues?

Time:04-28

I have 2 data frames that have been scraped from online and I need to combine them together into one data frame to export to excel. But I'm running into formatting issues and need to see if someone can help me solve this, please.

Dataframe 1=

df1= pd.DataFrame(table_contents)
    df1= df1.replace(r'\n','',regex=True)
    print(df1)

results:

           0                     1           2
0    Order Number       Manager         Order Date
1  Z57-808456-9         Victor Tully    01/13/2022

Dataframe2=

order_list.append(order_info)
df2 = pd.DataFrame(order_list)
df2.head()
print(df2)

results:

     Order Number      Location        Zip Code
 0  Z57-808456-9    Department 28      48911

I've tried using a few different alternatives but still not getting proper results.

combined_dfs= pd.concat([df1,df2],axis=1,join="inner")
print (combined_dfs)

results:

  Order Number       Location   Zip Code            0                  1          2
 0 Z57-808456-9  Department 28  48911       Order Number         Manager       Order Date

I was trying to get them all together on 2 rows and possibly remove the duplicate Order Number that shows up on both. If not I can still live with it altogether and a duplicate.

expected results:

  Order Number       Location   Zip Code      Manager           Order Date          
  Z57-808456-9  Department 28   48911       Victor Tully         01/13/2022       

CodePudding user response:

You can create columns by first row in DataFrame.set_axis, remove first row by iloc[1:] and then join with df2:

df = df1.set_axis(df1.iloc[0], axis=1, inplace=False).iloc[1:]
combined_dfs = df2.merge(df, on='Order Number')

print (combined_dfs)
   Order Number       Location  Zip Code       Manager  Order Date
0  Z57-808456-9  Department 28     48911  Victor Tully  01/13/2022

CodePudding user response:

It seems in your first data frame you have the column names as the first row. You can drop the first row and rename the columns, then merge the two dataframes.

# remove first row of data
df1 = df1.iloc[1:].reset_index()
# set column names
df1.columns = ['Order Number', 'Location', 'Zip Code']
# merge dataframes on order number
combined_df = pd.merge(df1, df2, on='Order Number', how='inner')

CodePudding user response:

pd.merge(df1, df2, on='Order Number')
  • Related