With a basic DataFrame with one header, one can iterate over the rows and access the values by column names:
import pandas as pd
df = pd.DataFrame(columns=['header1_column1', 'header1_column2'])
df['header1_column1'] = range(2)
df['header1_column2'] = range(2)
print(df)
header1_column1 header1_column2
0 0 0
1 1 1
for index, row in df.iterrows():
print(row['header1_column1'])
0
1
However, with a DataFrame that has multiple headers, iterating over the rows and accessing the values by column names yields an output with some overhead:
df = pd.DataFrame(columns=[['header1_column1', 'header1_column2'],
['header2_column1', 'header2_column2']])
df['header1_column1'] = range(2)
df['header1_column2'] = range(2)
print(df)
header1_column1 header1_column2
header2_column1 header2_column2
0 0 0
1 1 1
for index, row in df.iterrows():
print(row['header1_column1'])
header2_column1 0
Name: 0, dtype: int64
header2_column1 1
Name: 1, dtype: int64
How can I eliminate the overhead and have the same output as in the first case?
CodePudding user response:
I think you need select by tuple for MultiIndex
columns:
for index, row in df.iterrows():
print(row[('header1_column1','header2_column1')])
0
1