I'm looking for a more efficient way to grab acertain value at a certain index for every row in a Pandas DataFrame than the one I have currently. Here's the code I have so far. The .apply() function with the lambda syntax is super slow so would love a more optimised version.
Let me know if you need more info/context, thanks!
def find_session_time_given_index(df, value):
"""
Finds the minutes_in_session value where its corresponding index is the same as the value
where the last true exists in the column last_true_index. Used in modify_events_table() in a .apply.
:param df: pd.DataFrame
:param value: row in .apply()
:return: int
"""
last_true = value["last_true_index"]
return list(df[df["index"] == last_true]["minutes_in_session"])[0]
df["last_true_index"] = df["index"].mul(df["same_user_new_session"]).cummax()
df["minutes_of_last_true_index"] = df.apply(
lambda x: find_session_time_given_index(df, x), axis=1
)
EDIT: Here's an example output:
index | timestamp | user_id | last_true_index | minutes_in_session | minutes_of_last_true_index |
---|---|---|---|---|---|
0 | Time1 | 1 | 0 | 100 | 100 |
1 | Time2 | 1 | 0 | 101 | 100 |
11 | Time3 | 2 | 11 | 54 | 54 |
12 | Time4 | 2 | 11 | 55 | 54 |
The minutes_of_last_true_index
is the column to be populated.
CodePudding user response:
try merging with self:
df_new = pd.join(df, df['minutes_in_session'].reset_index(), left_on='last_true_index', right_on='index', suffixes=('', '_'))
df_new = df_new.drop(columns=('index',))\
.rename(columns={'minutes_in_session_':'minutes_of_last_true_index'}