I have started learning Python Pandas. So basically I am an R user and heavily use tidyverse. So I am trying to use Pandas in the same manner as the Tidyverse. So I am trying to execute this code which throws me an error.
(
pd.DataFrame(
{'A':[1,2,3],
'B':[4,5,6]}
)
.assign(A = lambda x: x.A 1,
B = lambda x: x.B x.A,
A = 1)
)
SyntaxError: keyword argument repeated: A
So how could I use pandas in a tidyverse
manner? More specifically is there any method in pandas that works like the dplyr::mutate
?
CodePudding user response:
Try not pass the re-assign A value
pd.DataFrame(
...: {'A': [1, 2, 3],
...: 'B': [4, 5, 6]}
...: ).assign(
...: B = lambda x: x.B x.A 1,
...: A = 1
...: )
...:
Out[154]:
A B
0 1 6
1 1 8
2 1 10
In R
dplyr
and tidyverse
assign value with mutate
two times is not necessary~
When you do groupby
in pandas
,
transform
is almost equal to group_by
mutate
in R
agg
is almost equal to group_by
summarise
in R
CodePudding user response:
One (maybe obvious) approach, could be to use several assign
:
(pd.DataFrame({'A':[1,2,3],
'B':[4,5,6]})
.assign(A = lambda x: x.A 1,
B = lambda x: x.B x.A,)
.assign(A = 1)
)
Another could be to use pipe
and a function:
def process(df):
df['A'] = df['A'] 1
df['B'] = df['A'] df['B'] 1
df['A'] = 1
return df
(pd.DataFrame({'A':[1,2,3],
'B':[4,5,6]})
.pipe(process)
)
output:
A B
0 1 6
1 1 8
2 1 10