Home > Blockchain >  Finding Relationships between members with Pandas
Finding Relationships between members with Pandas

Time:05-18

I have a CSV file with more than 10,000,000 rows of data with below structures:

I have a TeamID as my unique code, in each TeamID 2 members(Kevin,Smith) or 3 members (Kevin,John,Smith) can exist.

Data Format
TeamID      Members     MemberType
Team1       Kevin       Gold
Team1       John        Silver
Team1       Smith       Bronze
Team2       Kevin       Gold
Team2       Smith       Bronze
Team3       Kevin       Gold
Team3       Smith       Bronze
...

As you see above in some Teams John exists and in some other rows doesn't exist, but Kevin and Smith are always available in all teams. For determining relationship, Kevin is always Parent of John and John is always Parent of Smith but If John doesn't exist Kevin would be the parent of Smith:

TeamID      Members     MemberType  Parent
Team1       John        Silver      Kevin
Team1       Smith       Bronze      John
Team2       Smith       Bronze      Kevin
Team3       Smith       Bronze      Kevin
...

With this code below Kevin is always Parent and I don't consider John, I want to know if there is any way to consider this with above condition:

df["Parent"] = np.where(df["Members"].isin(['Kevin','John','Smith']), "Kevin", "-")

CodePudding user response:

You can sort by MemberType, shift per group, and join:

order = ['Gold', 'Silver', 'Bronze']
key = pd.Series({x: i for i,x in enumerate(order)})

df2 = df.sort_values(by='MemberType', key=key.get)

df.join(df2.groupby('TeamID')['Members'].shift().dropna().rename('Parent'),
        how='right')

output:

  TeamID Members MemberType Parent
1  Team1    John     Silver  Kevin
2  Team1   Smith     Bronze   John
4  Team2   Smith     Bronze  Kevin
6  Team3   Smith     Bronze  Kevin
  • Related