Home > Software engineering >  How to loop through rows in dataframe from one specific value until another in python?
How to loop through rows in dataframe from one specific value until another in python?

Time:05-03

I have a following dataframe:

Time Tab User Description
27.10.2021 15:58:00 Tab Alpha [email protected] Tab Alpha of type PARTSTUDIO opened by User A
27.10.2021 15:59:00 Tab Alpha [email protected] Start edit of part studio feature
27.10.2021 15:59:00 Tab Alpha [email protected] Cancel Operation
27.10.2021 15:59:00 Tab Alpha [email protected] Tab Alpha of type PARTSTUDIO opened by User B
27.10.2021 15:59:00 Tab Alpha [email protected] Start edit of part studio feature
27.10.2021 16:03:00 Tab Alpha [email protected] Cancel Operation
27.10.2021 16:03:00 Tab Alpha [email protected] Add assembly feature
27.10.2021 16:03:00 Tab Alpha [email protected] Tab Alpha of type PARTSTUDIO closed by User A
27.10.2021 16:03:00 Tab Beta [email protected] Tab Beta of type PARTSTUDIO opened by User A
27.10.2021 16:15:00 Tab Beta [email protected] Start edit of part studio feature
27.10.2021 16:15:00 Tab Alpha [email protected] Start edit of part studio feature
27.10.2021 16:15:00 Tab Alpha [email protected] Tab Alpha of type PARTSTUDIO closed by User B
27.10.2021 16:17:00 Tab Beta [email protected] Add assembly feature
27.10.2021 16:17:00 Tab Beta [email protected] Tab Beta of type ASSEMBLY opened by User C
27.10.2021 16:17:00 Tab Beta [email protected] Add assembly feature
27.10.2021 16:17:00 Tab Delta [email protected] Tab Delta of type PARTSTUDIO opened by User B
27.10.2021 16:54:00 Tab Delta [email protected] Add assembly feature
27.10.2021 16:55:00 Tab Beta [email protected] Tab Beta of type PARTSTUDIO closed by User A
27.10.2021 16:55:00 Tab Delta [email protected] Start edit of part studio feature
27.10.2021 16:55:00 Tab Delta [email protected] Tab Delta of type PARTSTUDIO closed by User B
27.10.2021 16:59:00 Tab Delta [email protected] Add assembly feature

How to loop through rows in dataframe based on a condition in the "Descripiton" column? The condition is to extract said values (Tab names) between values "Tab 'Tab_name' of type ... opened by User B" and "Tab 'Tab_name' of type ... closed by User B" to "User B" column.

Expected output:

Time Tab User Description UserB
27.10.2021 15:58:00 Tab Alpha [email protected] Tab Alpha of type PARTSTUDIO opened by User A
27.10.2021 15:59:00 Tab Alpha [email protected] Start edit of part studio feature
27.10.2021 15:59:00 Tab Alpha [email protected] Cancel Operation
27.10.2021 15:59:00 Tab Alpha [email protected] Tab Alpha of type PARTSTUDIO opened by User B
27.10.2021 15:59:00 Tab Alpha [email protected] Start edit of part studio feature Tab Alpha
27.10.2021 16:03:00 Tab Alpha [email protected] Cancel Operation Tab Alpha
27.10.2021 16:03:00 Tab Alpha [email protected] Add assembly feature Tab Alpha
27.10.2021 16:03:00 Tab Alpha [email protected] Tab Alpha of type PARTSTUDIO closed by User A Tab Alpha
27.10.2021 16:03:00 Tab Beta [email protected] Tab Beta of type PARTSTUDIO opened by User A Tab Alpha
27.10.2021 16:15:00 Tab Beta [email protected] Start edit of part studio feature Tab Alpha
27.10.2021 16:15:00 Tab Alpha [email protected] Start edit of part studio feature Tab Alpha
27.10.2021 16:15:00 Tab Alpha [email protected] Tab Alpha of type PARTSTUDIO closed by User B
27.10.2021 16:17:00 Tab Beta [email protected] Add assembly feature
27.10.2021 16:17:00 Tab Beta [email protected] Tab Beta of type ASSEMBLY opened by User C
27.10.2021 16:17:00 Tab Beta [email protected] Add assembly feature
27.10.2021 16:17:00 Tab Delta [email protected] Tab Delta of type PARTSTUDIO opened by User B
27.10.2021 16:54:00 Tab Delta [email protected] Add assembly feature Tab Delta
27.10.2021 16:55:00 Tab Beta [email protected] Tab Beta of type PARTSTUDIO closed by User A Tab Delta
27.10.2021 16:55:00 Tab Delta [email protected] Start edit of part studio feature Tab Delta
27.10.2021 16:55:00 Tab Delta [email protected] Tab Delta of type PARTSTUDIO closed by User B
27.10.2021 16:59:00 Tab Beta [email protected] Add assembly feature

Here is my try:

df.insert(4, 'User B', '0')

for index, row in df.iterrows():
    if row['Description'].find('opened by User B') != -1:
        tab = row['Description'].rpartition(' of type')[0]

    if row['Description'].find('closed by User B') == -1:
        df.at[index, 'User B'] = tab

Current code extracts tab name from each row where "Description" column does not contain "closed by User B". It should extract only from rows in range specified earlier in the question - between rows where value in "Description" column == "...opened by User B" and "...closed by User B". I'm aware that df.iterrows is not recommended, however I can't seem to find another way.

CodePudding user response:

Try finding all "opened" and "closed" descriptions and selecting slicing between them:

opens = df["Description"].str.contains(".*Tab.*of type.*opened by User B")
closes = df["Description"].str.contains(".*Tab.*of type.*closed by User B")

df["UserB"] = df["Tab"].where((opens.cumsum().shift().fillna(0)-closes.cumsum()).astype(bool))

>>> df
                   Time  ...      UserB
0   27.10.2021 15:58:00  ...        NaN
1   27.10.2021 15:59:00  ...        NaN
2   27.10.2021 15:59:00  ...        NaN
3   27.10.2021 15:59:00  ...        NaN
4   27.10.2021 15:59:00  ...  Tab Alpha
5   27.10.2021 16:03:00  ...  Tab Alpha
6   27.10.2021 16:03:00  ...  Tab Alpha
7   27.10.2021 16:03:00  ...  Tab Alpha
8   27.10.2021 16:03:00  ...   Tab Beta
9   27.10.2021 16:15:00  ...   Tab Beta
10  27.10.2021 16:15:00  ...  Tab Alpha
11  27.10.2021 16:15:00  ...        NaN
12  27.10.2021 16:17:00  ...        NaN
13  27.10.2021 16:17:00  ...        NaN
14  27.10.2021 16:17:00  ...        NaN
15  27.10.2021 16:17:00  ...        NaN
16  27.10.2021 16:54:00  ...  Tab Delta
17  27.10.2021 16:55:00  ...   Tab Beta
18  27.10.2021 16:55:00  ...  Tab Delta
19  27.10.2021 16:55:00  ...        NaN
20  27.10.2021 16:59:00  ...        NaN

[21 rows x 5 columns]

CodePudding user response:

Another solution:

df["User B"] = df["Description"].str.contains("opened by User B").shift()
df["User B"]  = df["Description"].str.contains("closed by User B") * -1

df["tmp"] = (
    df["Description"]
    .str.extract("(Tab .*?) of type .* opened by User B")
    .ffill()
)

df["User B"] = np.where(df["User B"].fillna(0).cumsum(), df["tmp"], "")
df = df.drop(columns="tmp")

print(df.to_markdown())

Prints:

Time Tab User Description User B
0 27.10.2021 15:58:00 Tab Alpha [email protected] Tab Alpha of type PARTSTUDIO opened by User A
1 27.10.2021 15:59:00 Tab Alpha [email protected] Start edit of part studio feature
2 27.10.2021 15:59:00 Tab Alpha [email protected] Cancel Operation
3 27.10.2021 15:59:00 Tab Alpha [email protected] Tab Alpha of type PARTSTUDIO opened by User B
4 27.10.2021 15:59:00 Tab Alpha [email protected] Start edit of part studio feature Tab Alpha
5 27.10.2021 16:03:00 Tab Alpha [email protected] Cancel Operation Tab Alpha
6 27.10.2021 16:03:00 Tab Alpha [email protected] Add assembly feature Tab Alpha
7 27.10.2021 16:03:00 Tab Alpha [email protected] Tab Alpha of type PARTSTUDIO closed by User A Tab Alpha
8 27.10.2021 16:03:00 Tab Beta [email protected] Tab Beta of type PARTSTUDIO opened by User A Tab Alpha
9 27.10.2021 16:15:00 Tab Beta [email protected] Start edit of part studio feature Tab Alpha
10 27.10.2021 16:15:00 Tab Alpha [email protected] Start edit of part studio feature Tab Alpha
11 27.10.2021 16:15:00 Tab Alpha [email protected] Tab Alpha of type PARTSTUDIO closed by User B
12 27.10.2021 16:17:00 Tab Beta [email protected] Add assembly feature
13 27.10.2021 16:17:00 Tab Beta [email protected] Tab Beta of type ASSEMBLY opened by User C
14 27.10.2021 16:17:00 Tab Beta [email protected] Add assembly feature
15 27.10.2021 16:17:00 Tab Delta [email protected] Tab Delta of type PARTSTUDIO opened by User B
16 27.10.2021 16:54:00 Tab Delta [email protected] Add assembly feature Tab Delta
17 27.10.2021 16:55:00 Tab Beta [email protected] Tab Beta of type PARTSTUDIO closed by User A Tab Delta
18 27.10.2021 16:55:00 Tab Delta [email protected] Start edit of part studio feature Tab Delta
19 27.10.2021 16:55:00 Tab Delta [email protected] Tab Delta of type PARTSTUDIO closed by User B
20 27.10.2021 16:59:00 Tab Delta [email protected] Add assembly feature
  • Related