Why is individual row in Pandas dataframe returned as a tuple, and why are my tuples "broken&qu-CodePudding

I am working with different data frames in pandas for Python 3 that contain different family relationships.

A sample data frame looks like this:

 i         function  \
0                                Anselm Franz Molitoris    Schwiegersohn   
1                                Anselm Franz Molitoris          Tochter   
2                                Anselm Franz Molitoris          Ehefrau   
3                                Anselm Franz Molitoris   Schwiegervater   
4                                Anselm Franz Molitoris          unknown   
...                                                 ...              ...   
1019                  Mauritius Ferdinand Anton Gudenus            Vater   
1020                  Mauritius Ferdinand Anton Gudenus           Mutter   
1021  Maria Magdalena Sidonia Gabriela Theresia Gudenus          Ehemann   
1022  Maria Magdalena Sidonia Gabriela Theresia Gudenus            Vater   
1023  Maria Magdalena Sidonia Gabriela Theresia Gudenus           Mutter   

                                    name ident  info  
0               Konrad Wilhelm Strecker     81  none  
1                          N. Molitoris    116  none  
2                    Maria Anna Gudenus   159   none  
3                 Johann Moritz Gudenus   231   none  
4                                         none  none  
...                                  ...   ...   ...  
1019             Daniel Morritz Gudenus    28   none  
1020   Anna Maria Barbara von Bielstein    364  none  
1021        Alexander Bernhard Strecker     75  none  
1022             Daniel Morritz Gudenus    28   none  
1023   Anna Maria Barbara von Bielstein    364  none

So they have 5 columns: i, function, name, ident and info.

I am using these two lines of code to read an individual row from selected data frames and print it:

for child in df_sibling2.iterrows():
        print(child)

Printing an individual row, I get this in the console output:

(24, i           Konrad Wilhelm Strecker
function                       Sohn
name                 Karl Strecker 
ident                            79
info                           none
Name: 24, dtype: object)

Checking the class, Python tells me that the type is a tuple. However, something is clearly wrong because there are no commas separating the actual values, and the header column of my data frame is part of the tuple data.

Was I using the wrong function to read the individual row in the first place, or is there another issue? I need individual rows to be able to write this information to different EXCEL sheets, so any data type that permits value selection by index is fine for me. A tuple is indexable and would thus be perfect, but what I am getting now is a mess. Help is very much appreciated.

CodePudding user response：

The docs for iterrows() state:

Iterate over DataFrame rows as (index, Series) pairs.

Here "pairs" means "tuples of length 2".

The docs further state:

Yields:

index: label or tuple of label
The index of the row. A tuple for a MultiIndex.
data: Series
The data of the row as a Series.

When you call print(child), you're seeing a tuple of length 2 containing 24 (the index value for the row) as its first element, followed by a comma , which in turn is followed by the corresponding data from this row as a Series with index i, function, name, ident and info as the second value.

CodePudding user response：

.iterrows() returns tuples of 2 elements, separated by a comma. The second element is a Pandas object, with its own display representation.

Simple example:

>>> df = pd.DataFrame({"a":2, "b":3}, index=["x"])
>>> row = next(df.iterrows())
>>> row
('x', a    2
b    3
Name: x, dtype: int64)
>>> type(row)
<class 'tuple'>
>>> type(row[0])
<class 'str'>
>>> type(row[1])
<class 'pandas.core.series.Series'>