I am working with the following dataframe and want to generate columns with 'grandchildren'. I wrote the function find_grandchild to extract the 'grandchildren' and tried to run it for the last column of every row using iloc via apply, but got the error 'too many indexers'. When I apply it to the same column using the column name in apply, I get the desired result.
data = {'Parent':['plant','plant','plant','cactus','algae','tropical plant','cactus','monstrera','blue_cactus','light_blue_cactus'],
'Child': ['cactus','algae','tropical_plant','aloe_vera','green_algae','monstrera','blue_cactus','monkey_monstrera','light_blue_cactus','desert_blue_cactus_lightblue']}
df = pd.DataFrame(data)
df
def find_grandchild_list(row):
grandchild_value = df.iloc[:,0] == row
return [df[grandchild_value].iloc[:,-1]]
I want my final dataframe to look like this:
plant | cactus | aloe vera
plant | cactus | blue cactus | light blue cactus | desert_blue_cactus_lightblue
plant | algea | green_algea
plant | tropical_plant | monstrera | monkey monstrera
successful:
df.apply(lambda row : find_grandchild_list(row['Child']), axis=1)
error:
df.apply(lambda row : find_grandchild_list(row.iloc[:,-1]), axis=1)
For my final script, I cannot use column name, because I want to use apply repeatedly and always run on the last column. My error is probably due to a poor understanding of iloc, but I couldn't find documentation on iloc in the context of apply.
CodePudding user response:
You are applying your lambda function to each row by specifying axis=1
. [Ref]
Therefore, row
in your lambda function is a pd.Series
of df.iloc[0]
, df.iloc[1]
, and others.
df.apply(lambda row : print(type(row)), axis=1)
>>>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
Because pd.Series
has a 1-dimensional index, you can use row['child']
, row[1]
, or row.iloc[1]
.