Let's say df
is a typical pandas.DataFrame
instance, I am trying to understand how come list(df)
would return a list of column names.
The goal here is for me to track it down in the source code to understand how list(<pd.DataFrame>)
returns a list of column names.
So far, the best resources I've found are the following:
- Get a list from Pandas DataFrame column headers
- Summary: There are multiple ways of getting a list of DataFrame column names, and each varies either in performance or idiomatic convention.
- SO Answer
- Summary: DataFrame follows a dict-like convention, thus coercing with
list()
would return a list of the keys of this dict-like structure.
- Summary: DataFrame follows a dict-like convention, thus coercing with
pandas.DataFrame
source code:- I can't find within the source code that point to how
list()
would create a list of column head names.
- I can't find within the source code that point to how
CodePudding user response:
DataFrames are iterable. That's why you can pass them to the list
constructor.
list(df)
is equivalent to [c for c in df]
. In both cases, DataFrame.__iter__
is called.
When you iterate over a DataFrame, you get the column names.
Why? Because the developers probably thought this is a nice thing to have.
Looking at the source, __iter__
returns an iterator over the attribute _info_axis
, which seems to be the internal name of the columns.
CodePudding user response:
Actually, as you have correctly stated in your question. One can think of a pandas dataframe as a list of lists (or more correctly a dict like object).
Take a look at this code which takes a dict
and parses it into a df
.
import pandas as pd
# create a dataframe
d = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(d)
print(df)
x = list(df)
print(x)
x = list(d)
print(x)
The result in both cases (for the dataframe df
and the dict d
) is this:
['col1', 'col2']
['col1', 'col2']
This result confirms your thinking that a "DataFrame follows a dict-like convention" .