Index | E_Id | P_Id | Date |
---|---|---|---|
121 | 701 | 9002 | 2021 |
122 | 701 | 9001 | 2019 |
123 | 702 | 9002 | 2021 |
124 | 702 | 9002 | 2019 |
125 | 703 | 9001 | 2021 |
126 | 704 | 9002 | 2019 |
127 | 704 | 9003 | 2019 |
Now I want to Create another DataFrame groupedby E_Id But I want to Count the number of rows against each P_Id call it 'x', and then sum of 'x' for whoever is linked with each E_Id
So
E_Id | TotalOfPIds |
---|---|
701 | 6 |
Can anyone help me?
As an intermediary step, I did this:
data['_Pid_Total'] = data.groupby('P_Id')[['P_Id']].transform('count')
And then for single E_Id it works like this:
data.loc[data['E_Id'] == '701', ['E_Id', 'P_Id', '_Pid_Total']].groupby(['E_Id', 'P_Id']).first().sum()
returning a single integer. However I want to use this in Transform method or just do it for entire DataFrame.
CodePudding user response:
If I understand you correctly, you want to total sum of P_Ids found in each group:
out = (
df.groupby("E_Id")
.apply(lambda x: df["P_Id"].isin(x["P_Id"]).sum())
.to_frame(name="TotalOfPIds")
.reset_index()
)
print(out)
Prints:
E_Id TotalOfPIds
0 701 6
1 702 4
2 703 2
3 704 5
CodePudding user response:
Thank you for an interesting question. My approach is:
- You creates a
df1
with unique combination of E_id and P_id - Create
df2
to count number of unique occur of P_id - Merge
df1
anddf2
, then usegroupby
withsum
I modify your sample a bit
# Dataframe:
df = pd.DataFrame({'e':[701, 701,701,701, 701, 701,702,702,703,704, 704, 704],
'p':[9002, 9002, 9002, 9002, 9001, 9003, 9002, 9002, 9001, 9002, 9003, 9001]})
# Step 1: Unique combination
df1 = df.groupby(['e', 'p']).count().reset_index()
# Step 2: Count Number of occurs for each unique value P_id
df2 = df.groupby('p').count().reset_index().rename(columns={'e':'val'})
# Step 3: Merge then use groupby and sum
df3 = pd.merge(df1, df2, on=['p'], how='left')
df3.groupby('e')[['val']].sum().reset_index()