I have dataframe like below
df = pd.DataFrame([[1,'A','10:10'],[1,'B','10:10'],[1,'C','10:10'],[1,'D','10:10'],[1,'E','2:11'],[2,'B','12:01'],
[2,'C','12:01'],[3,'A','13:22']],columns=['StudentNo','Subject','Time'])
StudentNo | Subject | Time |
---|---|---|
1 | A | 10:10 |
1 | B | 10:10 |
1 | C | 10:10 |
1 | D | 10:10 |
1 | E | 2:11 |
1 | B | 2:11 |
2 | B | 12:01 |
2 | C | 12:01 |
3 | A | 13:22 |
Required as below
StudentNo | Subject | Time |
---|---|---|
1 | A | 10:10 |
1 | B | 10:10 |
1 | C | 10:10 |
1 | D | 10:10 |
1 | E | 2:11 |
1 | B | 2:11 |
1 | A:B | 10:10 |
1 | A:C | 10:10 |
1 | A:D | 10:10 |
1 | B:C | 10:10 |
1 | B:D | 10:10 |
1 | C:D | 10:10 |
1 | E:B | 2:11 |
2 | B | 12:01 |
2 | C | 12:01 |
2 | B:C | 12:01 |
3 | A | 13:22 |
Along with the existing row, the Subject combination rows need to be added as new rows based on the groupby [StudentNo,Time]
I have done this using a function(studentNo,df) - which returns a new dataframe with the subject combination rows for each student number passed This functions is being called for each studentNo in FOR loop and keep Appending the returned dataframe in a new dataframe and finally appending with the main dataframe. This code is of around 30 lines of code.
But there could be better way to do this. Please suggest.
CodePudding user response:
This sounds like you want to use itertools.combinations
on the grouped values. You can then append that back to your original df.
import pandas as pd
from itertools import combinations
df = pd.DataFrame({'StudentNo': [1, 1, 1, 1, 1, 1, 2, 2, 3],
'Subject': ['A', 'B', 'C', 'D', 'E', 'B', 'B', 'C', 'A'],
'Time': ['10:10',
'10:10',
'10:10',
'10:10',
'2:11',
'2:11',
'12:01',
'12:01',
'13:22']})
df2 = df.groupby(['StudentNo','Time'])['Subject'].apply(lambda x: list(combinations(set(x),2))).explode().reset_index().dropna()
df2['Subject'] = df2['Subject'].apply(lambda x: ':'.join(sorted(x)))
output = pd.concat([df,df2]).reset_index(drop=True).sort_values(by='StudentNo')
print(output)
Output
StudentNo Subject Time
0 1 A 10:10
1 1 B 10:10
2 1 C 10:10
3 1 D 10:10
4 1 E 2:11
5 1 B 2:11
14 1 A:B 10:10
13 1 B:D 10:10
15 1 B:E 2:11
9 1 C:D 10:10
10 1 A:C 10:10
11 1 B:C 10:10
12 1 A:D 10:10
16 2 B:C 12:01
6 2 B 12:01
7 2 C 12:01
8 3 A 13:22