Three way conversation:
red_01 replies to green_01. green_01 replies back to red_01. This can be considered 1 three way conversation in the entire dataset.
So, I am trying to think of solution to query the count how many such conversations have occured in the entire dataset? This is 1 to 1 conversation system by the way.
I have already sorted by timestamp, used nth(0), nth(1) on red users. This preserved only first two messages sent by all the red users. I have used nth(0) on all green users. This preserves all the first messgaes sent by green users. But unable to think of anything to count the number of conversations that happened in exact sequence of three way conversation(red_0x sends message, green_0x replies, red_0x replies back)
I have a dataset like this:
conversation_id | user_id | messages | timestamp | |
---|---|---|---|---|
1 | red_01 | |||
1 | green_01 | |||
1 | red_01 |
CodePudding user response:
import pandas as pd
test_df = pd.DataFrame([[1,"red1","", 1],[1,"green1","", 2],[1,"red1","", 3], [2,"red1","", 1],[2,"red1","", 2],[2,"red1","", 3]],
columns = ["conversation_id","user_id","messages","timestamp"])
criteria = (
lambda x: len(x) == 3
and x.iloc[0].user_id.startswith("red")
and x.iloc[1].user_id.startswith("green")
and x.iloc[2].user_id.startswith("red")
)
n_groups = (
test_df
.sort_values("timestamp")
.groupby("conversation_id")
.filter(criteria)
).groupby("conversation_id").ngroups
n_groups
Output:
1