Here is my dataframe:
df = pd.DataFrame({'First Name': ['George', 'Alex', 'Leo'],
'Surname' : ['Davis', 'Mulan', 'Carlitos'],
'Age': [10, 15, 20],
'Size' : [30, 40, 50]})
Output:
First Name Surname Age Size
0 George Davis 10 30
1 Alex Mulan 15 40
2 Leo Carlitos 20 50
And here is a function:
def myfunc(firstname, surname):
print(firstname ' ' surname)
Now, I would like to iterate through the dataframe and check for the following conditions:
- IF df['age'] > than 11 AND df['size'] < 51
If there is a match (row 2 and row 3), I would like to call 'myfunc' and pass in the data from the applicable rows in the dataframe as the attributes of myfunc.
'myfunc()' would be called as:
myfunc(df[First Name], df[Surname])
So in this example the output after running the code would be:
'Alex Mulan'
'Leo Carlitos'
(The IF, AND condition was true in the second and third row.)
Please explain how could I achieve this goal and provide a working code snippet.
I would prefer a solution where no additional column is created. (If the solution remains practical. Otherwise a new column can be created and added to the dataframe if needed.)
CodePudding user response:
Use .loc
to filter your dataframe and apply your function. Use a lambda function as a proxy to call your function with the right signature.
def myfunc(firstname, surname):
return firstname ' ' surname
out = df.loc[df['Age'].gt(11) & df['Size'].lt(51), ['First Name', 'Surname']] \
.apply(lambda x: myfunc(*x), axis=1)
Output:
>>> out
1 Alex Mulan
2 Leo Carlitos
dtype: object
>>> type(out)
pandas.core.series.Series
CodePudding user response:
Try with agg
then
df.loc[(df.Age>11) & (df.Size<51),['First Name','Surname']].agg(' '.join,1)
Out[124]:
1 Alex Mulan
2 Leo Carlitos
dtype: object
CodePudding user response:
First select the required records using indexing, then concatenate the names:
selected = df[(df['Age'] > 11) & (df['Size'] < 51)]
print(selected['First Name'] " " selected['Surname'])
Edit: to pass each row to a generic function and ensure the right columns are passed, you can write a helper like this:
def apply(df, func, kwargs):
return df[kwargs].rename(columns=kwargs) \
.apply(lambda row: func(**row), axis=1)
print(apply(df=selected,
func=myfunc,
kwargs={"First Name": "firstname", "Surname": "surname"}))
There is often a more efficient solution that passes while columns to a function instead of applying it row by row.