I have:
----------- ------
|ColA |ColB |
----------- ------
| A | B|
| A | D|
| C | U|
| B | B|
| A | B|
----------- ------
and I want to get:
----------- ------
|ColA |ColB |
----------- ------
| A | D|
| C | U|
| B | B|
----------- ------
I want to "remove" all rows with the combination of "colA == A and colB == B". When I tried this SQL Statement
SELECT * FROM table where (colA != 'A' and colB != 'B')
worked fine.
But when I try to translate to spark (or even to pandas) I got an error.
Py4JError: An error occurred while calling o109.and. Trace:...
#spark
sparkDF.where((sparkDF['colA'] != 'A' & sparkDF['colB'] != 'B')).show()
#pandas
pandasDF[(pandasDF["colA"]!="A" & pandasDF["colB"]!="B")]
What am I doing wrong here?
CodePudding user response:
Need add parentheses and |
for bitwise OR:
pandasDF[(pandasDF["colA"]!="A") | (pandasDF["colB"]!="B")]
sparkDF.where((sparkDF['colA'] != 'A') | (sparkDF['colB'] != 'B')).show()