Suppose I have a dataframe A.
ID action
1 A
2 B
3 C
4 D
5 E
6 A
7 B
8 C
...
And I would like to find the follwing pattern from dataframe A:
[A,B,C]
I would like to count the frequency of the sublist and find the ID of each element:
output, this is just a idea of output, the output can be dict or list, I am not sure which type is better.
the frequency is 2:
[A:1,B:2,C:3]
[A:6, B:7,C:8]
CodePudding user response:
mask = (df.action.shift(1, fill_value='')
.add(df.action)
.add(df.action.shift(-1, fill_value=''))
.eq('ABC'))
output = df.loc[mask, 'ID'].apply(lambda x: f'[A:{x-1},B:{x}:C:{x 1}]')
print(f'The frequency is {len(output)}:')
for x in output:
print(x)
Output:
The frequency is 2:
[A:1,B:2:C:3]
[A:6,B:7:C:8]
CodePudding user response:
According to your question that you want to find the matching patterns from the dataframe
Let's assume this is your pattern pattern = ['A', 'B', 'C']
And find the rows if the value under the action column exists in the pattern list
new_df = df[df.action.isin(pattern)]
print(new_df)
output:
ID action
0 1 A
1 2 B
2 3 C
5 6 A
6 7 B
7 8 C
Then, convert the dataframe into tuples
id = list(new_df.ID)
action = list(new_df.action)
[(a, b) for (a, b) in zip(id, action)]
you will get a list of paired items as a tuple
[(1, 'A'), (2, 'B'), (3, 'C'), (6, 'A'), (7, 'B'), (8, 'C')]
N.B: If you modify your output, I can also modify my solution