Home > Net >  keep the same name until value = true in another pandas column
keep the same name until value = true in another pandas column

Time:01-19

I have a dataframe with 3 columns: session_id, name, reset_flag.

I need to make a new column, new_name, where the new name will be set to the first name where reset_flag=True, and then it will continue as that name WITHIN that session, until there is new reset_flag.

Not really sure best way to approach.

EDIT: I thought of a way to do so with df.iterrows(), by storing into list and then appending, but it seems very bulky. is there a more efficient 'pandas' way?

Sample expected output

session_id name reset_flag new_name
06c97a-bc7-6cc-29f-65978ee8d some_name_1 TRUE some_name_1
06c97a-bc7-6cc-29f-65978ee8d some_name_1 some_name_1
06c97a-bc7-6cc-29f-65978ee8d some_name_1 some_name_1
06c97a-bc7-6cc-29f-65978ee8d some_name_2 TRUE some_name_2
06c97a-bc7-6cc-29f-65978ee8d some_name_2 some_name_2
06c97a-bc7-6cc-29f-65978ee8d some_name_2 some_name_2
06c97a-bc7-6cc-29f-65978ee8d some_name_3 some_name_2
06c97a-bc7-6cc-29f-65978ee8d some_name_3 some_name_2
06c97a-bc7-6cc-29f-65978ee8d some_name_4 some_name_2
06c97a-bc7-6cc-29f-65978ee8d some_name_4 some_name_2
06c97a-bc7-6cc-29f-65978ee8d some_name_4 some_name_2
06c97a-bc7-6cc-29f-65978ee8d some_name_5 TRUE some_name_5
3943d5-e1e-63e-6c4-aa1899bd9 some_name_1 TRUE some_name_1
3943d5-e1e-63e-6c4-aa1899bd9 some_name_1 some_name_1
3943d5-e1e-63e-6c4-aa1899bd9 some_name_1 some_name_1
3943d5-e1e-63e-6c4-aa1899bd9 some_name_2 some_name_1
3943d5-e1e-63e-6c4-aa1899bd9 some_name_2 some_name_1
3943d5-e1e-63e-6c4-aa1899bd9 some_name_2 some_name_1
3943d5-e1e-63e-6c4-aa1899bd9 some_name_3 TRUE some_name_3
3943d5-e1e-63e-6c4-aa1899bd9 some_name_3 some_name_3
3943d5-e1e-63e-6c4-aa1899bd9 some_name_4 some_name_3
3943d5-e1e-63e-6c4-aa1899bd9 some_name_4 some_name_3
3943d5-e1e-63e-6c4-aa1899bd9 some_name_4 some_name_3
3943d5-e1e-63e-6c4-aa1899bd9 some_name_5 TRUE some_name_5
3943d5-e1e-63e-6c4-aa1899bd9 some_name_6 some_name_5

CodePudding user response:

Not sure if there is more efficient way of doing it, but this should work:

df['new_name'] = np.nan

session_name = np.nan

for index, row in df.iterrows():
  # I assume the 'TRUE' in your col is str. 
  if row['reset_flag'] == 'TRUE':
    df['new_name'][index] = row['name']
    session_name = row['name']
  else: 
    df['new_name'][index] = session_name

CodePudding user response:

Apply new name and then fillna

df['new_name'] = df.apply(lambda r: r['name'] if r['reset_flag'] else np.nan, aixs=1).fillna(method='ffill')

CodePudding user response:

An efficient way to go about this would be to use cumsum on the "reset_flag" column : this will give you a columns of numbers that increase every time a True is encountered.

You can then simply group by this column to get the desired result (I'm assuming your "reset_flag" column is boolean):

df["new_name"] = df.groupby(df["reset_flag"].cumsum())["name"].transform("first")
  • Related