Home > Mobile >  How do df['column_name'] and df[['column_name']] work in pandas?
How do df['column_name'] and df[['column_name']] work in pandas?

Time:10-09

I understood the former one gives me a series whereas the letter gives a dataframe. What I couldn't get is its arguments. df[['column_name']] is giving dataframe. Is that the reason cuz I'm sending ['column_name'] an iterative as its data= parameter? I'm struggling how python is working here! My results are following:

df['Yil']=
bir     2021
ikki    2020
19      2019
18      2018
17      2017
16      2016
15      2015
10      2010

df[['Yil']]=

        Yil
bir     2021
ikki    2020
19      2019
18      2018
17      2017
16      2016
15      2015
10      2010
Name: Yil, dtype: int64 

CodePudding user response:

df['column_name'] returns a Series that is that column

df[['column_name']] returns a DataFrame that has one column named column_name

which you clearly noticed...

dataframes have some different methods available to them vs series. it's hard to tell which one you want to use without more info.

CodePudding user response:

For selecting certain columns of a dataframe, the indexing can't be just any iterable. (For example, strings are iterable.) According to the documentation, it has to be a list, although from some quick testing, some other iterables will work:

Iterators

In [2]: df = pd.DataFrame({'a': [2, 3], 'b': [4, 5], 'c': [6, 7]})

In [3]: df[['a']]
Out[3]: 
   a
0  2
1  3

In [4]: df[iter(['a'])]  # Dummy iterator
Out[4]: 
   a
0  2
1  3

In [5]: df[(x for x in ['a'])]  # Dummy generator, a kind of iterator
Out[5]: 
   a
0  2
1  3

Ranges

In [6]: df1 = pd.DataFrame([['a', 'b'], ['c', 'd']])

In [7]: df1[range(1)]
Out[7]: 
   0
0  a
1  c

Dicts and sets also work, but they're deprecated.


In contrast, a tuple cannot be used to select multiple columns:

In [8]: df[('a',)]
Traceback (most recent call last):
  ...
KeyError: ('a',)

Because it needs to be possible to do multilevel column indexing:

In [9]: df2 = pd.DataFrame(
   ...:    [[2, 4], [3, 5]],
   ...:    columns=pd.MultiIndex.from_tuples([('a', 'b'), ('a', 'c')]))

In [10]: df2
Out[10]: 
   a   
   b  c
0  2  4
1  3  5

In [11]: df2[('a', 'c')]
Out[11]: 
0    4
1    5
Name: (a, c), dtype: int64
  • Related