I am working with different data frames in pandas for Python 3 that contain different family relationships.
A sample data frame looks like this:
i function \
0 Anselm Franz Molitoris Schwiegersohn
1 Anselm Franz Molitoris Tochter
2 Anselm Franz Molitoris Ehefrau
3 Anselm Franz Molitoris Schwiegervater
4 Anselm Franz Molitoris unknown
... ... ...
1019 Mauritius Ferdinand Anton Gudenus Vater
1020 Mauritius Ferdinand Anton Gudenus Mutter
1021 Maria Magdalena Sidonia Gabriela Theresia Gudenus Ehemann
1022 Maria Magdalena Sidonia Gabriela Theresia Gudenus Vater
1023 Maria Magdalena Sidonia Gabriela Theresia Gudenus Mutter
name ident info
0 Konrad Wilhelm Strecker 81 none
1 N. Molitoris 116 none
2 Maria Anna Gudenus 159 none
3 Johann Moritz Gudenus 231 none
4 none none
... ... ... ...
1019 Daniel Morritz Gudenus 28 none
1020 Anna Maria Barbara von Bielstein 364 none
1021 Alexander Bernhard Strecker 75 none
1022 Daniel Morritz Gudenus 28 none
1023 Anna Maria Barbara von Bielstein 364 none
So they have 5 columns: i
, function
, name
, ident
and info
.
I am using these two lines of code to read an individual row from selected data frames and print it:
for child in df_sibling2.iterrows():
print(child)
Printing an individual row, I get this in the console output:
(24, i Konrad Wilhelm Strecker
function Sohn
name Karl Strecker
ident 79
info none
Name: 24, dtype: object)
Checking the class, Python tells me that the type is a tuple. However, something is clearly wrong because there are no commas separating the actual values, and the header column of my data frame is part of the tuple data.
Was I using the wrong function to read the individual row in the first place, or is there another issue? I need individual rows to be able to write this information to different EXCEL sheets, so any data type that permits value selection by index is fine for me. A tuple is indexable and would thus be perfect, but what I am getting now is a mess. Help is very much appreciated.
CodePudding user response:
The docs for iterrows()
state:
Iterate over DataFrame rows as (index, Series) pairs.
Here "pairs" means "tuples of length 2".
The docs further state:
Yields:
index: label or tuple of label
The index of the row. A tuple for a MultiIndex.
data: Series
The data of the row as a Series.
When you call print(child)
, you're seeing a tuple of length 2 containing 24
(the index value for the row) as its first element, followed by a comma ,
which in turn is followed by the corresponding data from this row as a Series with index i, function, name, ident and info
as the second value.
CodePudding user response:
.iterrows()
returns tuples of 2 elements, separated by a comma. The second element is a Pandas object, with its own display representation.
Simple example:
>>> df = pd.DataFrame({"a":2, "b":3}, index=["x"])
>>> row = next(df.iterrows())
>>> row
('x', a 2
b 3
Name: x, dtype: int64)
>>> type(row)
<class 'tuple'>
>>> type(row[0])
<class 'str'>
>>> type(row[1])
<class 'pandas.core.series.Series'>