Home > Net >  How can I Manipulate list names inside columns in pandas dataframe
How can I Manipulate list names inside columns in pandas dataframe

Time:03-15

I have a DataFrame:

               RR                    AA                  SS         LL
 C1     [C1, C2, C3, C4, C5]        [C1]                [C1]    
 C2     [C2, C3, C5]            [C1, C2, C3, C5]    [C5, C3, C2]    I
 C3     [C2, C3, C4, C5]        [C1, C2, C3, C5]    [C5, C3, C2]    
 C4           [C4]              [C1, C3, C4, C5]        [C4]        I
 C5     [C2, C3, C4, C5]        [C1, C2, C3, C5]    [C5, C3, C2]    

I want to delete the entire row having LL I i.e., rows C2 and C4 Also need to delete the elements C2 and C4 from the remaining rows lists in RR, AA and SS so that the output should be like this:

            RR               AA            SS         LL
 C1     [C1, C3, C5]        [C1]          [C1]  
 C3     [C3, C5]        [C1, C3, C5]    [C5, C3]    
 C5     [C3, C5]        [C1, C3, C5]    [C5, C3]    

I tried this code but it only deletes the rows not C2 and C4 from list elements in RR, AA and SS.

ix = df.RS.apply(set) == df.IS.apply(set)
df.loc[~ix]

I am getting output like this where in RR, AA and SS, C2 and C4 are present in their lists which I don't need.

               RR                    AA                  SS         LL
 C1     [C1, C2, C3, C4, C5]        [C1]                [C1]    
 C3     [C2, C3, C4, C5]        [C1, C2, C3, C5]    [C5, C3, C2]    
 C5     [C2, C3, C4, C5]        [C1, C2, C3, C5]    [C5, C3, C2]    

CodePudding user response:

This should do it:

new_df = df.loc[df['LL'] != 'I', ['RR', 'AA', 'SS']].applymap(set).apply(lambda col: col - {'C2', 'C4'}).applymap(list)

Output:

>>> new_df
              RR            AA        SS
C1  {C5, C3, C1}          {C1}      {C1}
C3      {C5, C3}  {C1, C5, C3}  {C5, C3}
C5      {C5, C3}  {C1, C5, C3}  {C5, C3}

CodePudding user response:

col1 = ['C1','C2','C3','C4','C5']
RR = [['C1', 'C2', 'C3', 'C4', 'C5'], ['C2', 'C3', 'C5'], ['C2', 'C3', 'C4', 'C5'], 
        ['C4'], ['C2', 'C3', 'C4', 'C5']]
AA = [['C1'], ['C1', 'C2', 'C3', 'C5'], ['C1', 'C2', 'C3', 'C5'], ['C1', 'C3', 'C4', 'C5'], 
        ['C1', 'C2', 'C3', 'C5']]
SS = [['C1'], ['C5', 'C3', 'C2'], ['C5', 'C3', 'C2'], ['C4'], ['C5', 'C3', 'C2']]
LL = ['','I','','I','']

df1 = pd.DataFrame({'col1':col1, 'RR':RR,'AA':AA, 'SS':SS, 'LL':LL})

removing_row = df1.loc[df1['LL'] == 'I', 'col1']
removing_index = list(removing_row.index)
removing_values = removing_row.values

df1.drop(df1.index[removing_index], inplace=True, axis=0)

for col in ['RR','AA','SS']:
    for i,j in df1[col].iteritems():
        for k in removing_values:
            if k in j:
                j.remove(k)
        df1[col][i] = j

print(df1)
  • Related