I'm using read_excel to read a file and I need to list each row and column where column is major than zero.
The excel content is similar like this:
fruit | america_inv | asia_inv | aurope_inv
apple | 0.0 | 2.0 | 0.0
banana | 1.0 | 0.0 | 0.0
orange | 2.0 | 1.0 | 3.0
The expected output is something like this:
apple, asia_inv
banana, america_inv
orange, america_inv
orange, asia_inv
orange, europe_inv
Any idea?
I'm frozen at the first steps:
import pandas as pd
#Read excel file
df_excel_data = pd.read_excel('fruits.xlsm', sheet_name='all_fruits', usecols="C:BB", header=7)
#Normalize header
df_excel_data.columns = df_excel_data.columns.str.strip().str.lower().str.replace(' ', '_', regex=True).str.replace('(', '', regex=True).str.replace(')', '', regex=True).str.replace('\n', '_', regex=True)
df = pd.DataFrame(df_excel_data)
CodePudding user response:
You can use melt
:
out = (df.melt('fruit', var_name='continent', ignore_index=False)
.loc[lambda x: x.pop('value') > 0]
.sort_index(ignore_index=True))
print(out)
# Output
fruit continent
0 apple asia_inv
1 banana america_inv
2 orange america_inv
3 orange asia_inv
4 orange aurope_inv
Or (perhaps more understandable):
out = (df.melt('fruit', var_name='continent', ignore_index=False)
.query('value > 0')[['fruit', 'continent']]
.sort_index(ignore_index=True))