Pandas: Replace the first row of each group with second row if meet specific condition-CodePudding

I want to filter all qualified boss in the dataset. The rule is, each group has one boss, if the status of the boss is "Active" then its valid, if the statue of boss is "Inactive" then we need to replace it using the second row of that group.

For example, in row 1 group A, Allen is Boss but Inactive, then the second row of group A need to replace the first row. In group B, Kim is Boss and Active, so its valid.

Group	Name	Title	Status
A	Allen	Boss	Inactive
A	Lisa	n	Active
A	Louis	n	Active
B	Kim	Boss	Active
B	Derek	n	Active
B	Andrew	n	Active
B	Katie	n	Active
C	Lux	Boss	Inactive

In the end, all inactive Boss row should be filtered out, end result should be like below.

Group	Name	Title	Status
A	Lisa	n	Active
B	Kim	Boss	Active

Thanks in advance!

CodePudding user response：

How's something like this?

df[df.Status.eq("Active")].drop_duplicates(subset=["Group"], ignore_index=True)

Output:

    Group   Name    Title       Status
0   A       Lisa    n           Active
1   B       Kim     Boss        Active

Stepping through it:

df[df.Status.eq("Active")] grabs only the rows where "Status" is "Active"
drop_duplicates(subset=["Group"] drops all rows after the first occurrence of a new value in "Group" ... e.g returns the first row with group A, then the first row with Group B, etc
ignore_index=True) ignores the above rows' index and resets the index to start back at 0. Without this the index would be [1, 3]

CodePudding user response：

Drop the Inactive rows, and then take the first line from each group.

df[df.Status.ne('Inactive')].groupby('Group', as_index=False).first()

Output:

  Group  Name Title  Status
0     A  Lisa     n  Active
1     B   Kim  Boss  Active

CodePudding user response：

solustion

import pandas as pd
data=pd.read_csv("./level.csv")
# print(data)
group_A=data[data["Group"]=="A"]
group_B=data[data["Group"]=="B"]
group_C=data[data["Group"]=="C"]
for (key,value) in group_A.iterrows():
    if value["Status"]=="Active":
        value["Title"]="Boss"
        break
print(group_A)

output

   Group   Name Title    Status
0     A  Allen  Boss  Inactive
1     A   Lisa  Boss    Active
2     A  Louis     n    Active

CodePudding user response：

Extract rows with specific property

import pandas
with open("./group.csv") as group:
    data = pandas.read_csv(group)
# get a row data  having Boss title
    row_boss = data[data.Title == "Boss"]
    print(row_boss)
# get a row data  having Active status
    row_active = data[data["Status"] == "Active"]
    print(row_active)

no difference each of other ▶ data["Status"] = data.Status