Why does `list(<pd.DataFrame>)` return a list of column names?-CodePudding

Let's say df is a typical pandas.DataFrame instance, I am trying to understand how come list(df) would return a list of column names.

The goal here is for me to track it down in the source code to understand how list(<pd.DataFrame>) returns a list of column names.

So far, the best resources I've found are the following:

Get a list from Pandas DataFrame column headers
- Summary: There are multiple ways of getting a list of DataFrame column names, and each varies either in performance or idiomatic convention.
SO Answer
- Summary: DataFrame follows a dict-like convention, thus coercing with list() would return a list of the keys of this dict-like structure.
pandas.DataFrame source code:
- I can't find within the source code that point to how list() would create a list of column head names.

CodePudding user response：

DataFrames are iterable. That's why you can pass them to the list constructor.

list(df) is equivalent to [c for c in df]. In both cases, DataFrame.__iter__ is called.

When you iterate over a DataFrame, you get the column names.

Why? Because the developers probably thought this is a nice thing to have.

Looking at the source, __iter__ returns an iterator over the attribute _info_axis, which seems to be the internal name of the columns.

CodePudding user response：

Actually, as you have correctly stated in your question. One can think of a pandas dataframe as a list of lists (or more correctly a dict like object).

Take a look at this code which takes a dict and parses it into a df.

import pandas as pd

# create a dataframe
d = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(d)

print(df)

x = list(df)
print(x)

x = list(d)
print(x)

The result in both cases (for the dataframe df and the dict d) is this:

['col1', 'col2']
['col1', 'col2']

This result confirms your thinking that a "DataFrame follows a dict-like convention" .