Home > Enterprise >  pandas.read_excel list if column is major
pandas.read_excel list if column is major

Time:06-02

I'm using read_excel to read a file and I need to list each row and column where column is major than zero.

The excel content is similar like this:

fruit | america_inv | asia_inv | aurope_inv
apple | 0.0 | 2.0 | 0.0 
banana | 1.0 | 0.0 | 0.0
orange | 2.0 | 1.0 | 3.0

The expected output is something like this:

apple, asia_inv
banana, america_inv
orange, america_inv
orange, asia_inv
orange, europe_inv

Any idea?

I'm frozen at the first steps:

import pandas as pd

#Read excel file
df_excel_data = pd.read_excel('fruits.xlsm', sheet_name='all_fruits', usecols="C:BB", header=7)

#Normalize header
df_excel_data.columns = df_excel_data.columns.str.strip().str.lower().str.replace(' ', '_', regex=True).str.replace('(', '', regex=True).str.replace(')', '', regex=True).str.replace('\n', '_', regex=True)


df = pd.DataFrame(df_excel_data)

CodePudding user response:

You can use melt:

out = (df.melt('fruit', var_name='continent', ignore_index=False)
         .loc[lambda x: x.pop('value') > 0]
         .sort_index(ignore_index=True))
print(out)

# Output
    fruit    continent
0   apple     asia_inv
1  banana  america_inv
2  orange  america_inv
3  orange     asia_inv
4  orange   aurope_inv

Or (perhaps more understandable):

out = (df.melt('fruit', var_name='continent', ignore_index=False)
         .query('value > 0')[['fruit', 'continent']]
         .sort_index(ignore_index=True))
  • Related