Home > Net >  How to update dataframe cell value based on values in other columns?
How to update dataframe cell value based on values in other columns?

Time:07-07

I have a pandas dataframe (called removedCols) of ~2000 rows, and I am trying to populate certain columns in my dataframe by using values in corresponding cells. An exerpt of the original dataframe is as such:

 A      B      C      D     labels
 0      0      0      0     ['D', 'C']
 0      0      0      0     []
 0      0      0      0     ['A','B','D']
 0      0      0      0     ['D']

My goal is to replace the values for the corresponding columns, in the labels column. Such that we get,

 A      B      C      D     labels
 0      0      1      1     ['D', 'C']
 0      0      0      0     []
 1      1      0      1     ['A','B','D']
 0      0      0      1     ['D']

I have tried many different solutions, such as first extracting labels to a list, and iterating over that, or iterating over the indexes of the dataframe.

for i in removedCols.index:
     for value in removedCols.iloc[i]['labels']:
          removedCols.at[i, value] = 1

However, these solutions seem to provide random combinations of 0's and 1's - and do not exactly match with what is given in labels column.

UPDATE: Double check your indexes.

CodePudding user response:

Use DataFrame.update with Series.str.join and Series.str.get_dummies:

import ast
#if necessary
#df['labels'] = df['labels'].apply(ast.literal_eval)

df.update(df['labels'].str.join('|').str.get_dummies())
print (df)

   A  B  C  D     labels
0  0  0  1  1     [D, C]
1  0  0  0  0         []
2  1  1  0  1  [A, B, D]
3  0  0  0  1        [D]

CodePudding user response:

Try this:

for idx, row in df.iterrows():
   for elm in row['labels']:
      if elm in df:
         df[elm][idx] = 1

Here you iterate through all df rows and for every rows you set the columns present in the labels list to 1.

  • Related