I have a dataframe with 3 columns: session_id
, name
, reset_flag
.
I need to make a new column, new_name
, where the new name will be set to the first name
where reset_flag=True
, and then it will continue as that name WITHIN that session, until there is new reset_flag
.
Not really sure best way to approach.
EDIT: I thought of a way to do so with df.iterrows(), by storing into list and then appending, but it seems very bulky. is there a more efficient 'pandas' way?
Sample expected output
session_id | name | reset_flag | new_name |
---|---|---|---|
06c97a-bc7-6cc-29f-65978ee8d | some_name_1 | TRUE | some_name_1 |
06c97a-bc7-6cc-29f-65978ee8d | some_name_1 | some_name_1 | |
06c97a-bc7-6cc-29f-65978ee8d | some_name_1 | some_name_1 | |
06c97a-bc7-6cc-29f-65978ee8d | some_name_2 | TRUE | some_name_2 |
06c97a-bc7-6cc-29f-65978ee8d | some_name_2 | some_name_2 | |
06c97a-bc7-6cc-29f-65978ee8d | some_name_2 | some_name_2 | |
06c97a-bc7-6cc-29f-65978ee8d | some_name_3 | some_name_2 | |
06c97a-bc7-6cc-29f-65978ee8d | some_name_3 | some_name_2 | |
06c97a-bc7-6cc-29f-65978ee8d | some_name_4 | some_name_2 | |
06c97a-bc7-6cc-29f-65978ee8d | some_name_4 | some_name_2 | |
06c97a-bc7-6cc-29f-65978ee8d | some_name_4 | some_name_2 | |
06c97a-bc7-6cc-29f-65978ee8d | some_name_5 | TRUE | some_name_5 |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_1 | TRUE | some_name_1 |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_1 | some_name_1 | |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_1 | some_name_1 | |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_2 | some_name_1 | |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_2 | some_name_1 | |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_2 | some_name_1 | |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_3 | TRUE | some_name_3 |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_3 | some_name_3 | |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_4 | some_name_3 | |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_4 | some_name_3 | |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_4 | some_name_3 | |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_5 | TRUE | some_name_5 |
3943d5-e1e-63e-6c4-aa1899bd9 | some_name_6 | some_name_5 |
CodePudding user response:
Not sure if there is more efficient way of doing it, but this should work:
df['new_name'] = np.nan
session_name = np.nan
for index, row in df.iterrows():
# I assume the 'TRUE' in your col is str.
if row['reset_flag'] == 'TRUE':
df['new_name'][index] = row['name']
session_name = row['name']
else:
df['new_name'][index] = session_name
CodePudding user response:
Apply new name and then fillna
df['new_name'] = df.apply(lambda r: r['name'] if r['reset_flag'] else np.nan, aixs=1).fillna(method='ffill')
CodePudding user response:
An efficient way to go about this would be to use cumsum
on the "reset_flag" column : this will give you a columns of numbers that increase every time a True
is encountered.
You can then simply group by this column to get the desired result (I'm assuming your "reset_flag" column is boolean):
df["new_name"] = df.groupby(df["reset_flag"].cumsum())["name"].transform("first")