I have a .xlsx file I am creating two dataframes from and then trying to concat them so that the right columns append to the bottom of the left columns. How do I get the data in the 2nd dataframe to move directly below the 1st dataframe with each dataframe having a different number of columns?
import pandas as pd
data = pd.read_excel('test.xlsx')
header = data[['col_1', 'col_2', 'col_3']]
detail = data[['col_4', 'col_5', 'col_6', 'col_7']]
combined = pd.concat([header, detail], ignore_index=True)
combined.to_csv('output.csv')
test.xlsx
col_1 | col_2 | col_3 | col_4 | col_5 | col_6 | col_7 |
---|---|---|---|---|---|---|
aaaaa | bbbbb | ccccc | ddddd | eeeee | fffff | ggggg |
Actual: output.csv
col_1 | col_2 | col_3 | col_4 | col_5 | col_6 | col_7 |
---|---|---|---|---|---|---|
aaaaa | bbbbb | ccccc | ||||
ddddd | eeeee | fffff | ggggg |
Expected: output.csv
col_1 | col_2 | col_3 | col_4 | col_5 | col_6 | col_7 |
---|---|---|---|---|---|---|
aaaaa | bbbbb | ccccc | ||||
ddddd | eeeee | fffff | ggggg |
CodePudding user response:
Can you try this:
header = data[['col_1', 'col_2', 'col_3']]
detail = data[['col_4', 'col_5', 'col_6', 'col_7']]
detail.columns=data.columns[:len(detail.columns)]
header.columns=data.columns[:len(header.columns)]
combined = pd.concat([header, detail])
for i in data.columns:
if i in combined.columns:
pass
else:
combined[i]=np.nan
CodePudding user response:
Here is a proposition with pandas.concat
:
header = ['col_1', 'col_2', 'col_3']
detail = ['col_4', 'col_5', 'col_6', 'col_7']
func = lambda x: dict(zip(x, range(1, len(x) 1)))
out = (
pd.concat([data.loc[:, header].rename(columns= func(header)),
data.loc[:, detail].rename(columns= func(detail))],
ignore_index=True, axis=0)
)
out.loc[:, header] = np.NaN
out.columns = data.columns
# Output :
print(out)
col_1 col_2 col_3 col_4 col_5 col_6 col_7
0 aaaaa bbbbb ccccc NaN NaN NaN NaN
1 ddddd eeeee fffff ggggg NaN NaN NaN