I have two questions, but first I will give the context. I am trying to use a pandas DataFrame
with some existing code using a functional programming approach. I basically want to map a function to every row of a DataFrame
, expanding the row using the double-asterisk keyword argument notation, where each column name of the DataFrame
corresponds to one of the arguments of the existing function.
For example, say I have the following function.
def line(m, x, b):
y = (m * x) b
return y
And I have a pandas DataFrame
data = [{"b": 1, "m": 1, "x": 2}, {"b": 2, "m": 2, "x": 3}]
df = pd.DataFrame(data)
# Returns
# b m x
# 0 1 1 2
# 1 2 2 3
Ultimately, I want to construct a column in the DataFrame
from the results of line
applied to each row; something like the following.
# Note that I'm using the list of dicts defined above, not the DataFrame.
results = [line(**datum) for datum in data]
I feel like I should be able to use some combination of DataFrame.apply
, a lambda
, probably Series.to_dict
, and the double-asterisk keyword argument expansion but I can't figure out what is passed to the lambda
in the following expression.
df.apply(lambda x: x, axis=1)
# ^
# What is pandas passing to my identity lambda?
I've tried to inspect with type
and x.__class__
, but both of the following lines throw TypeErrors
.
df.apply(lambda x: type(x), axis=1)
df.apply(lambda x: x.__class__, axis=1)
I don't want to write/refactor a new line
function that can wrangle some pandas object because I shouldn't have to. Ultimately, I want to end up with a DataFrame
with columns for the input data and a column with the corresponding output of the line
function.
My two questions are:
- How can I pass a row of a pandas
DataFrame
to a function using keyword-argument expansion, either using theDataFrame.apply
method or some other (functional) approach? - What exactly is
DataFrame.apply
passing to the function that I specify?
Maybe there is some other functional approach I could take that I'm just not aware of, but I figure pandas is a pretty popular library for this kind of thing and that's why I'm trying to use it. Also there are some data (de)serialization issues I'm facing that pandas should make pretty easy vs. writing a more bespoke solution.
Thanks.
CodePudding user response:
Maybe this is what you are looking for.
1)
df.apply(lambda x: line(**x.to_dict()), axis=1)
Result
0 3
1 8
2)
The function for df.apply(..., axis=1)
receives a Series
representing a row with the column names as index entries.