Home > Back-end >  python Find Count frequency of Sublist in given dataframe and find the associate value
python Find Count frequency of Sublist in given dataframe and find the associate value

Time:07-17

Suppose I have a dataframe A.

ID   action
1      A
2      B
3      C
4      D
5      E
6      A
7      B
8      C
...

And I would like to find the follwing pattern from dataframe A:

[A,B,C]

I would like to count the frequency of the sublist and find the ID of each element:

output, this is just a idea of output, the output can be dict or list, I am not sure which type is better.

the frequency is 2:
[A:1,B:2,C:3]
[A:6, B:7,C:8]

CodePudding user response:

mask = (df.action.shift(1, fill_value='')
          .add(df.action)
          .add(df.action.shift(-1, fill_value=''))
          .eq('ABC'))
output = df.loc[mask, 'ID'].apply(lambda x: f'[A:{x-1},B:{x}:C:{x 1}]')
print(f'The frequency is {len(output)}:')
for x in output:
    print(x)

Output:

The frequency is 2:
[A:1,B:2:C:3]
[A:6,B:7:C:8]

CodePudding user response:

According to your question that you want to find the matching patterns from the dataframe

Let's assume this is your pattern pattern = ['A', 'B', 'C']

And find the rows if the value under the action column exists in the pattern list

new_df = df[df.action.isin(pattern)]

print(new_df)

output:

   ID   action
0   1   A
1   2   B
2   3   C
5   6   A
6   7   B
7   8   C

Then, convert the dataframe into tuples

id = list(new_df.ID)
action = list(new_df.action)
[(a, b) for (a, b) in zip(id, action)]

you will get a list of paired items as a tuple

[(1, 'A'), (2, 'B'), (3, 'C'), (6, 'A'), (7, 'B'), (8, 'C')]

N.B: If you modify your output, I can also modify my solution

  • Related