I have a question that was already treated in 2014: See @jeff answer. However I would like to know if things evolved since.
I wrote a code that worked for me but didn't for a colleague:
df2 = pd.DataFrame({'A': ["yes", "no", "yes", "no"], 'B': [1,2,1,3]})
A B
0 yes 1
1 no 2
2 yes 1
3 no 3
df = pd.DataFrame({'A': [7,5,9,4], 'B': [4,4,4,4]})
A B
0 7 4
1 5 4
2 9 4
3 4 4
df[df2["A"] == "yes"] = np.full(2, 20.0) * df.values[2]
This works perfectly find for me and gives me the desired output:
A B
0 180 80
1 5 4
2 180 80
3 4 4
However it returns an error to my colleague:
ValueError: cannot set using a list-like indexer with a different length than the value
And this is what @Jeff described in his answer "Bottom line, don't use lists inside of a pandas object. Its not efficient, and just makes interpretation difficult / impossible."
Was there any changes in pandas that now allow this way of coding? I didnt find anything in the documentation that warns against this possible wrong interpretation.
If this is still a bad practice what does the best practice say?
CodePudding user response:
I do not think the answer you are citing is relevant to your problem. Here you are assigning an np.array of shape (2,)
:
(np.full(2, 20.0) * df.values[2]).shape
to an np array of shape (2,2)
:
(df[df2["A"] == "yes"]).shape
which is handled perfectly well by numpy broadcasting rules. @Jeff answer is about plain lists
so I think your colleague just needs to upgrade either pandas or numpy or both. You can check versions (in terminal) via eg
pip show pandas
pip show numpy
or in Jupyter notebook via
!pip show pandas
!pip show numpy
your code works in the recent versions (for me pandas 1.2.2 and numpy 1.19.5