Home > Software design >  When creating a boolean series from subsetting a df, what does index-subsetting in the same line fil
When creating a boolean series from subsetting a df, what does index-subsetting in the same line fil

Time:08-29

I'm experimenting with pandas and noticed something that seemed odd.

If you have a boolean series defined as an object, you can then subset that object by index numbers, e.g.,

From df 'ah' enter image description here, creating creating this boolean series, 'tah' via

tah = ah['_merge']=='left_only'

enter image description here

This boolean series could be index-subset like this:

tah[0:1]

yielding: enter image description here

Yet if I tried to do this all in one line

ah['_merge']=='left_only'[0:1]

I get an unexpected output, where the boolean series is neither sliced nor seems to correspond to the subsetted-column: enter image description here

I've been experimenting and can't seem to determine what, in the all-in-on-line [0:1] is slicing/filtering-for. Any clarification would be appreciated!

CodePudding user response:

Because you are equating string's 'left_over' index of 0 (letter 'l') and this yields false result in every row and since 'l' is not equal to 'left_over' nor to 'both', it prints all a column of false booleans.

CodePudding user response:

You can use (ah['_merge']=='left_only')[0:1] as MattDMo mentionned in the comments.

Or you can also use pandas.Series.iloc with a slice object to select the elements you need based on their postion/index in your dataframe.

(ah['_merge']=='left_only').iloc[0:1]

Both of commands will return True since the first row of your dataframe has a 'left_only' type of merge.

  • Related