Home > database >  Python Pandas Dataframe By Column and Row Value
Python Pandas Dataframe By Column and Row Value

Time:06-10

I have a pandas dataframe with multiple column values that are the same. I'd like to select the series in the column where I pass it the column name, and then they have a unique identifier in one of the index/rows. My data frame looks like this:

                       TestPart     TestPart    OtherPart
attribute                                                                                  
Location                 Plant       Factory      Plant
Lead Time                25            56          30
Value1                   5             10          15
Value2                   A             B           C

There are many more parts, but these 2 serve as a good example. I'd like to select all the values in the column TestPart and Plant.

I'm able to isolate these to columns specifically with df.loc[:, "TestPart"], but I'd like to further isolate down by somehow selecting the column based on the Location value.

Is this possible? I've read through a lot of post trying iloc and other tricks, but can't seem to come up with the right recipe for this one yet.

EDIT:

Trying to add some more info for clarity. I have the dataframe shown above.

I would like to select the column with the Column = "TesPart" and the Location = "Plant".

The output I need is the entire series in that column like this:

attribute     
Location                         Plant
Lead Time                          25
Value1                             5
Value2                             A

CodePudding user response:

It sounds like what you’re looking for is df.loc[:, "TestPart"][df["Location"] == "Plant"]

CodePudding user response:

Here need compare row Location selected by DataFrame.loc and columns names, for filter columns use again DataFrame.loc with : for all rows and columns by condition:

df = df.loc[:, df.loc['Location'].eq('Plant') & (df.columns == 'TestPart')]
print (df)
          TestPart
Location     Plant
Lead Time       25
Value1           5
Value2           A

If create MultiIndex by original columns and first row of data solution is simplier - select by tuple:

df.columns = [df.columns, df.iloc[0]]
df1 = df.iloc[1:].reset_index(drop=True)
print (df1)
         TestPart         OtherPart
Location    Plant Factory     Plant
0              25      56        30
1               5      10        15
2               A       B         C

s = df1[('TestPart','Plant')]
print (s)
0    25
1     5
2     A
Name: (TestPart, Plant), dtype: object
  • Related