Transforming in Pandas GroupBy-CodePudding

Index	E_Id	P_Id	Date
121	701	9002	2021
122	701	9001	2019
123	702	9002	2021
124	702	9002	2019
125	703	9001	2021
126	704	9002	2019
127	704	9003	2019

Now I want to Create another DataFrame groupedby E_Id But I want to Count the number of rows against each P_Id call it 'x', and then sum of 'x' for whoever is linked with each E_Id

E_Id	TotalOfPIds
701	6

Can anyone help me?

As an intermediary step, I did this:

data['_Pid_Total'] = data.groupby('P_Id')[['P_Id']].transform('count')

And then for single E_Id it works like this:

data.loc[data['E_Id'] == '701', ['E_Id', 'P_Id', '_Pid_Total']].groupby(['E_Id', 'P_Id']).first().sum() returning a single integer. However I want to use this in Transform method or just do it for entire DataFrame.

CodePudding user response：

If I understand you correctly, you want to total sum of P_Ids found in each group:

out = (
    df.groupby("E_Id")
    .apply(lambda x: df["P_Id"].isin(x["P_Id"]).sum())
    .to_frame(name="TotalOfPIds")
    .reset_index()
)
print(out)

Prints:

   E_Id  TotalOfPIds
0   701            6
1   702            4
2   703            2
3   704            5

CodePudding user response：

Thank you for an interesting question. My approach is:

You creates a df1 with unique combination of E_id and P_id
Create df2 to count number of unique occur of P_id
Merge df1 and df2, then use groupby with sum

I modify your sample a bit

# Dataframe:
df = pd.DataFrame({'e':[701, 701,701,701, 701, 701,702,702,703,704, 704, 704],
                   'p':[9002, 9002, 9002, 9002, 9001, 9003, 9002, 9002, 9001, 9002, 9003, 9001]})

# Step 1: Unique combination
df1 = df.groupby(['e', 'p']).count().reset_index()

# Step 2: Count Number of occurs for each unique value P_id
df2 = df.groupby('p').count().reset_index().rename(columns={'e':'val'})

# Step 3: Merge then use groupby and sum
df3 = pd.merge(df1, df2, on=['p'], how='left')
df3.groupby('e')[['val']].sum().reset_index()