I am having an issue with manipulating indices once I have used the groupby command. My problem is similar to this code:
import pandas as pd
import numpy as np
np.random.seed(0)
df=pd.DataFrame(np.random.randint(0,10,size=(1000000,5)),columns=list('ABCDE'))
M=df.groupby(['A','B','D','E'])['C'].sum().unstack()
M
E 0 1 2 3 4 5 6 7 8 9
A B D
0 0 0 464 414 553 420 499 394 528 423 415 443
1 407 479 392 441 433 472 520 421 484 384
2 545 546 523 356 386 434 531 534 486 417
3 408 511 422 424 477 351 452 395 341 492
4 502 462 403 434 428 444 506 414 418 328
... ... ... ... ... ... ... ... ... ... ...
9 9 5 419 416 485 386 581 330 408 489 394 454
6 416 475 469 490 357 523 418 514 555 499
7 528 419 462 486 565 388 438 445 469 521
8 390 454 566 341 459 463 478 463 426 499
9 414 436 441 462 403 415 362 472 433 430
[1000 rows x 10 columns]
I am wondering how to filter down to only situations where B is greater than A, when they are both in the index here. If they weren't in the index then I would be doing something like M=M[M['A']<M['B']].
CodePudding user response:
You can temporarily convert the index to_frame
:
out = M.loc[M.index.to_frame().query('B>A').index]
Or use Index.get_level_values
:
A = M.index.get_level_values('A')
B = M.index.get_level_values('B')
out = M.loc[B>A]
Output:
E 0 1 2 3 4 5 6 7 8 9
A B D
0 1 0 489 452 421 455 442 377 440 476 477 451
1 468 448 473 443 557 492 471 460 476 469
2 576 472 465 355 503 448 491 437 546 425
3 404 438 474 516 410 446 411 459 467 450
4 500 418 441 445 420 605 467 580 479 377
... ... ... ... ... ... ... ... ... ... ...
8 9 5 390 466 436 493 446 508 375 390 485 393
6 457 478 476 417 458 460 361 397 432 403
7 516 587 379 406 396 449 430 433 357 432
8 390 460 489 427 346 490 498 454 395 345
9 474 510 466 336 484 577 443 428 459 406
[450 rows x 10 columns]