Home > database >  Filter and group list of duplicate names
Filter and group list of duplicate names

Time:11-18

How do I change the shape of this data so that the subject column becomes a top row header containing all unique values. Name and Surnames are listed in the 1st and 2nd column as unique values. Then in each cell I need a true or false of whether the person is in that subject class. I need to transpose or reshape the data but how on earth is this done in code?

SUBJECT         NAME        SURNAME
Art             person1     Surname1
Art             person2     surname2
Art             person3     Surname3
Art             person4     Surname4
Art             person5     Surname5
Art             person6     Surname6
Art             person7     Surname7
Art             person8     Surname8
DT              person1     Surname1
DT              person3     Surname3
DT              person5     Surname5
Photography     person1     Surname1
Photography     person2     surname2
Photography     person3     Surname3
Photography     person5     Surname5
Photography     person8     Surname8
Games           person4     Surname4
Games           person5     Surname5
Games           person6     Surname6
Games           person7     Surname7
Games           person8     Surname8
Games           person9     Surname9

So that it looks like this:

    Name    Surname  Art    DT    Photography    Games
    person1 Surname1 True   False True          etc.... 
    person2 surname2 False  True  False         etc...
    person3 Surname3                    
    person4 Surname4                    
    person5 Surname5                    
    person6 Surname6                    
    person7 Surname7                    
    person8 Surname8                    
    person9 Surname9

CodePudding user response:

Use:

(df.value_counts().unstack(0)
 .notnull()
 .reindex(columns=df['SUBJECT'].unique())
 .reset_index()
 .rename_axis(columns=None))

Output:

    NAME    SURNAME     Art     DT      Photography Games
0   person1 Surname1    True    True    True        False
1   person2 surname2    True    False   True        False
2   person3 Surname3    True    True    True        False
3   person4 Surname4    True    False   False       True
4   person5 Surname5    True    True    True        True
5   person6 Surname6    True    False   False       True
6   person7 Surname7    True    False   False       True
7   person8 Surname8    True    False   True        True
8   person9 Surname9    False   False   False       True

CodePudding user response:

This is a crosstab converted to boolean:

out = (pd
 .crosstab([df['NAME'], df['SURNAME']], df['SUBJECT'])
 .astype(bool)
 .reset_index().rename_axis(columns=None)
)

Output:

      NAME   SURNAME    Art     DT  Games  Photography
0  person1  Surname1   True   True  False         True
1  person2  surname2   True  False  False         True
2  person3  Surname3   True   True  False         True
3  person4  Surname4   True  False   True        False
4  person5  Surname5   True   True   True         True
5  person6  Surname6   True  False   True        False
6  person7  Surname7   True  False   True        False
7  person8  Surname8   True  False   True         True
8  person9  Surname9  False  False   True        False
  • Related