How to combine Excel file with python? (columnAB columnC) I use pandas to combine Excel file.However, it doesn't work.
!pip install xlrd
!pip install xlwt
!pip install openpyxl
import xlrd
import xlwt
import openpyxl
import pandas as pd
df = pd.DataFrame()
for f in ['MOMO摩天-專品商品名稱購買人數.xls', "MOMO摩天-專品價格.xls"]:
data = pd.read_excel(f, 'Sheet1')
data.index = [os.path.basename(f)]
df = df.append(data)
df.to_excel('Combine.xls')
CodePudding user response:
You can convert Excel files to two distinct Dataframe and then use this function to merge two Dataframe in one:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html
ex:
> df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
'value': [1, 2, 3, 5]})
> df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
'value': [5, 6, 7, 8]})
> df1
lkey value
0 foo 1
1 bar 2
2 baz 3
3 foo 5
> df2
rkey value
0 foo 5
1 bar 6
2 baz 7
3 foo 8
> df1.merge(df2, left_on='lkey', right_on='rkey')
lkey value_x rkey value_y
0 foo 1 foo 5
1 foo 1 foo 8
2 foo 5 foo 5
3 foo 5 foo 8
4 bar 2 bar 6
5 baz 3 baz 7
CodePudding user response:
You can import an Excel file into Python using Pandas 'read_excel'.
import pandas as pd
#Replace - Path where the Excel file is stored\File name.xlsx
#Replace - your Excel sheet name
df = pd.read_excel (r'Path where the Excel file is stored\File name.xlsx', sheet_name='your Excel sheet name')
print (df)
Combine Two Columns in Pandas.
You can use the following syntax to combine two text columns into one in a pandas DataFrame:
#Replace - Column Names with applicable column names
df['new_column'] = df['column1'] df['column2']
You can export Pandas DataFrame to an Excel file using to_excel
#Replace - Path where the exported excel file will be stored\File Name.xlsx
df.to_excel(r'Path where the exported excel file will be stored\File Name.xlsx', index = False)
CodePudding user response:
import pandas as pd
df1 = pd.read_excel(name1) #if doesn't work, change file format to xlsx
df2 = pd.read_excel(name2)
Now use merge or join. For example:
df = df1.join(df2)