I have a pandas dataframe with multiple column values that are the same. I'd like to select the series in the column where I pass it the column name, and then they have a unique identifier in one of the index/rows. My data frame looks like this:
TestPart TestPart OtherPart
attribute
Location Plant Factory Plant
Lead Time 25 56 30
Value1 5 10 15
Value2 A B C
There are many more parts, but these 2 serve as a good example. I'd like to select all the values in the column TestPart and Plant.
I'm able to isolate these to columns specifically with df.loc[:, "TestPart"]
, but I'd like to further isolate down by somehow selecting the column based on the Location
value.
Is this possible? I've read through a lot of post trying iloc and other tricks, but can't seem to come up with the right recipe for this one yet.
EDIT:
Trying to add some more info for clarity. I have the dataframe shown above.
I would like to select the column with the Column = "TesPart" and the Location = "Plant".
The output I need is the entire series in that column like this:
attribute
Location Plant
Lead Time 25
Value1 5
Value2 A
CodePudding user response:
It sounds like what you’re looking for is df.loc[:, "TestPart"][df["Location"] == "Plant"]
CodePudding user response:
Here need compare row Location
selected by DataFrame.loc
and columns names, for filter columns use again DataFrame.loc
with :
for all rows and columns by condition:
df = df.loc[:, df.loc['Location'].eq('Plant') & (df.columns == 'TestPart')]
print (df)
TestPart
Location Plant
Lead Time 25
Value1 5
Value2 A
If create MultiIndex by original columns and first row of data solution is simplier - select by tuple:
df.columns = [df.columns, df.iloc[0]]
df1 = df.iloc[1:].reset_index(drop=True)
print (df1)
TestPart OtherPart
Location Plant Factory Plant
0 25 56 30
1 5 10 15
2 A B C
s = df1[('TestPart','Plant')]
print (s)
0 25
1 5
2 A
Name: (TestPart, Plant), dtype: object