I have a data frame like this,
|ID |Pass | Fail | Pass_student_ID |Failed_student_ID|
|:-:|:---:|:----:|:---------------:|:---------------:|
| 1 | 2 | 0 | [101,102] | 0 |
| 2 | 1 | 1 | [102] | 101 |
| 3 | 0 | 1 | [101,102] | 0 |
I want an output like this
| ID|Student_ID|Status|
|:-:|:--------:|:----:|
| 1 | 101 | Pass |
| 1 | 102 | Pass |
| 2 | 101 | Fail |
| 2 | 102 | Pass |
| 3 | 101 | Fail |
| 3 | 102 | Fail |
How do I write in python to get this kind of data frame?
CodePudding user response:
IIUC, you can explode
, then reshape with melt
:
(df.explode('Pass_student_ID')
[['ID', 'Pass_student_ID', 'Failed_student_ID']]
.rename(columns={'Pass_student_ID': 'Pass', 'Failed_student_ID': 'Fail'})
.melt('ID', value_name='Student_ID', var_name='Status')
.loc[lambda d: d['Student_ID'].ne(0)]
.sort_values(by='ID')
)
output:
ID Status Student_ID
0 1 Pass 101
1 1 Pass 102
2 2 Pass 102
7 2 Fail 101
3 3 Pass 101
4 3 Pass 102
CodePudding user response:
Use DataFrame.drop
for remove Pass/Fail
columns, unpivot by DataFrame.melt
, convert lists to scalars by DataFrame.explode
, filter out 0
values and for Status
filter first 4
values of columns names, last sorting by DataFrame.sort_values
:
import ast
#if lists are string representation
df['Pass_student_ID'] = df['Pass_student_ID'].apply(ast.literal_eval)
df1 = (df.drop(['Pass','Fail'],axis=1)
.melt('ID', value_name='Student_ID', var_name='Status')
.explode('Student_ID')
.loc[lambda x: x['Student_ID'].ne(0)]
.assign(Status = lambda x: x['Status'].str[:4])
.sort_values('ID', ignore_index=True))
print (df1)
ID Status Student_ID
0 1 Pass 101
1 1 Pass 102
2 2 Pass 102
3 2 Fail 101
4 3 Pass 101
5 3 Pass 102
If multiple columns in DataFrame is possible select values by subset instead drop
:
df1 = (df[['ID','Pass_student_ID','Failed_student_ID']]
.melt('ID', value_name='Student_ID', var_name='Status')
.explode('Student_ID')
.loc[lambda x: x['Student_ID'].ne(0)]
.assign(Status = lambda x: x['Status'].str[:4])
.sort_values('ID', ignore_index=True))
print (df1)
ID Status Student_ID
0 1 Pass 101
1 1 Pass 102
2 2 Pass 102
3 2 Fail 101
4 3 Pass 101
5 3 Pass 102