I have a following dataframe:
Time | Tab | User | Description |
---|---|---|---|
27.10.2021 15:58:00 | Tab Alpha | [email protected] | Tab Alpha of type PARTSTUDIO opened by User A |
27.10.2021 15:59:00 | Tab Alpha | [email protected] | Start edit of part studio feature |
27.10.2021 15:59:00 | Tab Alpha | [email protected] | Cancel Operation |
27.10.2021 15:59:00 | Tab Alpha | [email protected] | Tab Alpha of type PARTSTUDIO opened by User B |
27.10.2021 15:59:00 | Tab Alpha | [email protected] | Start edit of part studio feature |
27.10.2021 16:03:00 | Tab Alpha | [email protected] | Cancel Operation |
27.10.2021 16:03:00 | Tab Alpha | [email protected] | Add assembly feature |
27.10.2021 16:03:00 | Tab Alpha | [email protected] | Tab Alpha of type PARTSTUDIO closed by User A |
27.10.2021 16:03:00 | Tab Beta | [email protected] | Tab Beta of type PARTSTUDIO opened by User A |
27.10.2021 16:15:00 | Tab Beta | [email protected] | Start edit of part studio feature |
27.10.2021 16:15:00 | Tab Alpha | [email protected] | Start edit of part studio feature |
27.10.2021 16:15:00 | Tab Alpha | [email protected] | Tab Alpha of type PARTSTUDIO closed by User B |
27.10.2021 16:17:00 | Tab Beta | [email protected] | Add assembly feature |
27.10.2021 16:17:00 | Tab Beta | [email protected] | Tab Beta of type ASSEMBLY opened by User C |
27.10.2021 16:17:00 | Tab Beta | [email protected] | Add assembly feature |
27.10.2021 16:17:00 | Tab Delta | [email protected] | Tab Delta of type PARTSTUDIO opened by User B |
27.10.2021 16:54:00 | Tab Delta | [email protected] | Add assembly feature |
27.10.2021 16:55:00 | Tab Beta | [email protected] | Tab Beta of type PARTSTUDIO closed by User A |
27.10.2021 16:55:00 | Tab Delta | [email protected] | Start edit of part studio feature |
27.10.2021 16:55:00 | Tab Delta | [email protected] | Tab Delta of type PARTSTUDIO closed by User B |
27.10.2021 16:59:00 | Tab Delta | [email protected] | Add assembly feature |
How to loop through rows in dataframe based on a condition in the "Descripiton" column? The condition is to extract said values (Tab names) between values "Tab 'Tab_name' of type ... opened by User B" and "Tab 'Tab_name' of type ... closed by User B" to "User B" column.
Expected output:
Time | Tab | User | Description | UserB |
---|---|---|---|---|
27.10.2021 15:58:00 | Tab Alpha | [email protected] | Tab Alpha of type PARTSTUDIO opened by User A | |
27.10.2021 15:59:00 | Tab Alpha | [email protected] | Start edit of part studio feature | |
27.10.2021 15:59:00 | Tab Alpha | [email protected] | Cancel Operation | |
27.10.2021 15:59:00 | Tab Alpha | [email protected] | Tab Alpha of type PARTSTUDIO opened by User B | |
27.10.2021 15:59:00 | Tab Alpha | [email protected] | Start edit of part studio feature | Tab Alpha |
27.10.2021 16:03:00 | Tab Alpha | [email protected] | Cancel Operation | Tab Alpha |
27.10.2021 16:03:00 | Tab Alpha | [email protected] | Add assembly feature | Tab Alpha |
27.10.2021 16:03:00 | Tab Alpha | [email protected] | Tab Alpha of type PARTSTUDIO closed by User A | Tab Alpha |
27.10.2021 16:03:00 | Tab Beta | [email protected] | Tab Beta of type PARTSTUDIO opened by User A | Tab Alpha |
27.10.2021 16:15:00 | Tab Beta | [email protected] | Start edit of part studio feature | Tab Alpha |
27.10.2021 16:15:00 | Tab Alpha | [email protected] | Start edit of part studio feature | Tab Alpha |
27.10.2021 16:15:00 | Tab Alpha | [email protected] | Tab Alpha of type PARTSTUDIO closed by User B | |
27.10.2021 16:17:00 | Tab Beta | [email protected] | Add assembly feature | |
27.10.2021 16:17:00 | Tab Beta | [email protected] | Tab Beta of type ASSEMBLY opened by User C | |
27.10.2021 16:17:00 | Tab Beta | [email protected] | Add assembly feature | |
27.10.2021 16:17:00 | Tab Delta | [email protected] | Tab Delta of type PARTSTUDIO opened by User B | |
27.10.2021 16:54:00 | Tab Delta | [email protected] | Add assembly feature | Tab Delta |
27.10.2021 16:55:00 | Tab Beta | [email protected] | Tab Beta of type PARTSTUDIO closed by User A | Tab Delta |
27.10.2021 16:55:00 | Tab Delta | [email protected] | Start edit of part studio feature | Tab Delta |
27.10.2021 16:55:00 | Tab Delta | [email protected] | Tab Delta of type PARTSTUDIO closed by User B | |
27.10.2021 16:59:00 | Tab Beta | [email protected] | Add assembly feature |
Here is my try:
df.insert(4, 'User B', '0')
for index, row in df.iterrows():
if row['Description'].find('opened by User B') != -1:
tab = row['Description'].rpartition(' of type')[0]
if row['Description'].find('closed by User B') == -1:
df.at[index, 'User B'] = tab
Current code extracts tab name from each row where "Description" column does not contain "closed by User B". It should extract only from rows in range specified earlier in the question - between rows where value in "Description" column == "...opened by User B" and "...closed by User B". I'm aware that df.iterrows is not recommended, however I can't seem to find another way.
CodePudding user response:
Try finding all "opened" and "closed" descriptions and selecting slicing between them:
opens = df["Description"].str.contains(".*Tab.*of type.*opened by User B")
closes = df["Description"].str.contains(".*Tab.*of type.*closed by User B")
df["UserB"] = df["Tab"].where((opens.cumsum().shift().fillna(0)-closes.cumsum()).astype(bool))
>>> df
Time ... UserB
0 27.10.2021 15:58:00 ... NaN
1 27.10.2021 15:59:00 ... NaN
2 27.10.2021 15:59:00 ... NaN
3 27.10.2021 15:59:00 ... NaN
4 27.10.2021 15:59:00 ... Tab Alpha
5 27.10.2021 16:03:00 ... Tab Alpha
6 27.10.2021 16:03:00 ... Tab Alpha
7 27.10.2021 16:03:00 ... Tab Alpha
8 27.10.2021 16:03:00 ... Tab Beta
9 27.10.2021 16:15:00 ... Tab Beta
10 27.10.2021 16:15:00 ... Tab Alpha
11 27.10.2021 16:15:00 ... NaN
12 27.10.2021 16:17:00 ... NaN
13 27.10.2021 16:17:00 ... NaN
14 27.10.2021 16:17:00 ... NaN
15 27.10.2021 16:17:00 ... NaN
16 27.10.2021 16:54:00 ... Tab Delta
17 27.10.2021 16:55:00 ... Tab Beta
18 27.10.2021 16:55:00 ... Tab Delta
19 27.10.2021 16:55:00 ... NaN
20 27.10.2021 16:59:00 ... NaN
[21 rows x 5 columns]
CodePudding user response:
Another solution:
df["User B"] = df["Description"].str.contains("opened by User B").shift()
df["User B"] = df["Description"].str.contains("closed by User B") * -1
df["tmp"] = (
df["Description"]
.str.extract("(Tab .*?) of type .* opened by User B")
.ffill()
)
df["User B"] = np.where(df["User B"].fillna(0).cumsum(), df["tmp"], "")
df = df.drop(columns="tmp")
print(df.to_markdown())
Prints:
Time | Tab | User | Description | User B | |
---|---|---|---|---|---|
0 | 27.10.2021 15:58:00 | Tab Alpha | [email protected] | Tab Alpha of type PARTSTUDIO opened by User A | |
1 | 27.10.2021 15:59:00 | Tab Alpha | [email protected] | Start edit of part studio feature | |
2 | 27.10.2021 15:59:00 | Tab Alpha | [email protected] | Cancel Operation | |
3 | 27.10.2021 15:59:00 | Tab Alpha | [email protected] | Tab Alpha of type PARTSTUDIO opened by User B | |
4 | 27.10.2021 15:59:00 | Tab Alpha | [email protected] | Start edit of part studio feature | Tab Alpha |
5 | 27.10.2021 16:03:00 | Tab Alpha | [email protected] | Cancel Operation | Tab Alpha |
6 | 27.10.2021 16:03:00 | Tab Alpha | [email protected] | Add assembly feature | Tab Alpha |
7 | 27.10.2021 16:03:00 | Tab Alpha | [email protected] | Tab Alpha of type PARTSTUDIO closed by User A | Tab Alpha |
8 | 27.10.2021 16:03:00 | Tab Beta | [email protected] | Tab Beta of type PARTSTUDIO opened by User A | Tab Alpha |
9 | 27.10.2021 16:15:00 | Tab Beta | [email protected] | Start edit of part studio feature | Tab Alpha |
10 | 27.10.2021 16:15:00 | Tab Alpha | [email protected] | Start edit of part studio feature | Tab Alpha |
11 | 27.10.2021 16:15:00 | Tab Alpha | [email protected] | Tab Alpha of type PARTSTUDIO closed by User B | |
12 | 27.10.2021 16:17:00 | Tab Beta | [email protected] | Add assembly feature | |
13 | 27.10.2021 16:17:00 | Tab Beta | [email protected] | Tab Beta of type ASSEMBLY opened by User C | |
14 | 27.10.2021 16:17:00 | Tab Beta | [email protected] | Add assembly feature | |
15 | 27.10.2021 16:17:00 | Tab Delta | [email protected] | Tab Delta of type PARTSTUDIO opened by User B | |
16 | 27.10.2021 16:54:00 | Tab Delta | [email protected] | Add assembly feature | Tab Delta |
17 | 27.10.2021 16:55:00 | Tab Beta | [email protected] | Tab Beta of type PARTSTUDIO closed by User A | Tab Delta |
18 | 27.10.2021 16:55:00 | Tab Delta | [email protected] | Start edit of part studio feature | Tab Delta |
19 | 27.10.2021 16:55:00 | Tab Delta | [email protected] | Tab Delta of type PARTSTUDIO closed by User B | |
20 | 27.10.2021 16:59:00 | Tab Delta | [email protected] | Add assembly feature |