I have a dataframe where I want to drop only data with index 'car','p1'
, however when I use .drop
function I need to use all 4 levels of indexes 'car','valueA','row','p1'
to drop the data I want.
How can I drop data from multiindexed Dataframe by using something like this command:
dataFrame.drop(('car',None,None,'p1'), axis=0, inplace=True)
Here is my data code and dataframe where I manage to drop by using whole multiindex 'car','valueA','row','p1'
:
Code:
import numpy as np
import pandas as pd
# multiindex array
arr = [np.array(['car', 'car', 'car','car', 'car', 'car', 'car', 'car', 'car', 'truck', 'truck', 'truck', 'truck', 'truck', 'truck','truck', 'truck', 'truck','bike','bike', 'bike','bike','bike', 'bike','bike','bike', 'bike']),
np.array(['valueA', 'valueA','valueA', 'valueA','valueA', 'valueA','valueA', 'valueA','valueA','valueB','valueB','valueB','valueB','valueB','valueB','valueB','valueB','valueB', 'valueC','valueC','valueC','valueC','valueC','valueC','valueC','valueC','valueC']),
np.array(['row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row']),
np.array(['p1','p1','p1','p2','p2','p2','p3','p3','p3','p1','p1','p1','p2','p2','p2','p3','p3','p3','p1','p1','p1','p2','p2','p2','p3','p3','p3',]),
np.array(['1','2','3','1','2','3','1','2','3','1','2','3','1','2','3','1','2','3','1','2','3','1','2','3','1','2','3',])]
# forming multiindex dataframe
dataFrame = pd.DataFrame(
np.random.randn(27, 3), index=arr,columns=['Col 1', 'Col 2', 'Col 3'])
dataFrame.index.names = ['level 0', 'level 1','level 2','level 3','level 4']
print(dataFrame)
print("\nDropping specific row...\n");
dataFrame.drop(('car','valueA','row','p1'), axis=0, inplace=True)
print(dataFrame)
Dataframe after dropping:
Col 1 Col 2 Col 3
level 0 level 1 level 2 level 3 level 4
car valueA row p2 1 -0.202113 0.475475 0.871960
2 0.776150 1.435102 -0.756707
3 0.117550 0.120139 0.718093
p3 1 -1.141276 -0.656897 1.296046
2 1.632846 1.689873 -0.992740
3 0.207730 -0.007627 0.331016
truck valueB row p1 1 -0.510714 -0.471667 1.423341
2 -0.753657 0.352551 0.688307
3 -0.824962 0.729206 0.295181
p2 1 -1.668048 0.883333 0.077169
2 0.496375 0.002827 0.202063
3 1.446275 -0.349694 -1.215787
p3 1 0.609428 2.184825 1.619343
2 0.039672 -0.338794 -1.023429
3 1.583751 -0.931371 0.784551
bike valueC row p1 1 -0.896791 0.049717 1.555789
2 0.117095 1.407567 1.398970
3 0.813442 0.440550 -0.808965
p2 1 0.984040 -0.347328 -1.139446
2 -0.363173 -0.710894 2.973986
3 -0.810208 0.004661 -0.006106
p3 1 1.247540 -1.260834 0.139684
2 0.609170 1.841452 0.965086
3 -0.648415 -0.138171 0.697330
CodePudding user response:
pre-requisite: IndexSlice
You can use pandas.IndexSlice
to easily slice the columns:
idx = pd.IndexSlice
dataFrame.loc[idx['car',:,:,'p1']]
output:
Col 1 Col 2 Col 3
level 0 level 1 level 2 level 3 level 4
car valueA row p1 1 0.7433 0.7007 1.0691
2 -1.1336 -1.0243 -0.6874
3 0.2181 0.1967 1.6890
now, let's drop:
to drop, just use the above to get the index of the rows to drop:
to_drop = dataFrame.loc[idx['car',:,:,'p1']].index
dataFrame.drop(to_drop) # add inplace=True if needed to drop in place
output:
Col 1 Col 2 Col 3
level 0 level 1 level 2 level 3 level 4
car valueA row p2 1 0.3053 -1.3057 -0.1287
2 2.5257 -1.6639 -0.5921
3 0.8080 -0.2103 -1.1286
p3 1 -0.7016 0.1553 2.1906
2 0.5787 0.2155 -1.0574
3 -0.4153 0.1872 0.2001
truck valueB row p1 1 -1.2780 1.3715 -0.0653
2 0.2365 -0.0084 -0.4676
3 0.7442 0.0395 1.2570
p2 1 0.2128 0.0567 -0.6916
2 -0.7449 -0.3231 -1.3954
3 -0.3366 -2.1328 -0.9524
p3 1 -0.1372 -2.3368 0.3554
2 -0.3781 -0.9169 0.2724
3 -0.0303 0.2812 -1.0810
bike valueC row p1 1 -0.4342 0.9801 0.2852
2 0.9794 0.7521 -0.6850
3 0.6731 -1.2610 1.0722
p2 1 1.0940 0.4086 0.9345
2 0.1387 0.7512 -1.0006
3 -0.1079 -0.1318 0.9483
p3 1 -0.8483 -0.7513 -0.2429
2 -1.6328 1.8877 -0.5835
3 1.1729 -1.0088 1.0520