Home > Software engineering >  difference between two different ways to query data
difference between two different ways to query data

Time:03-24

I'm not sure if I understand it wrong or if the codes below are wrong but from my understanding the following codes should give the same output:

transactions[transactions['date'].str[14:16] == '21' | transactions['date'].str[14:16] == '22']

and

transactions.query("date.str.slice(11, 13) == '21' | date.str.slice(11, 13) == '22'")

I get an error when I try the first code:

transactions[transactions['date'].str[14:16] == '21' | transactions['date'].str[14:16] == '22']

error:

TypeError: Cannot perform 'ror_' with a dtyped [object] array and scalar of type [bool]

But when using .query method i get no errors:

transactions.query("date.str.slice(11, 13) == '21' | date.str.slice(11, 13) == '22'")

How can i change the first code without using the query method and getting the same output as the second code with the query method?

CodePudding user response:

This is because of Python's operator precedence. | is more important than ==, so '21' | transactions['date'].str[14:16] gets evaluated before transactions['date'].str[14:16] == '21'.

To fix the first line of code, simply add parentheses around the == conditions:

transactions[(transactions['date'].str[14:16] == '21') | (transactions['date'].str[14:16] == '22')]

Or, if you prefer, use .eq instead of ==:

transactions[transactions['date'].str[14:16].eq('21') | transactions['date'].str[14:16].eq('22')]
  • Related