I have a pandas dataframe (called removedCols
) of ~2000 rows, and I am trying to populate certain columns in my dataframe by using values in corresponding cells. An exerpt of the original dataframe is as such:
A B C D labels
0 0 0 0 ['D', 'C']
0 0 0 0 []
0 0 0 0 ['A','B','D']
0 0 0 0 ['D']
My goal is to replace the values for the corresponding columns, in the labels
column. Such that we get,
A B C D labels
0 0 1 1 ['D', 'C']
0 0 0 0 []
1 1 0 1 ['A','B','D']
0 0 0 1 ['D']
I have tried many different solutions, such as first extracting labels
to a list, and iterating over that, or iterating over the indexes of the dataframe.
for i in removedCols.index:
for value in removedCols.iloc[i]['labels']:
removedCols.at[i, value] = 1
However, these solutions seem to provide random combinations of 0's and 1's - and do not exactly match with what is given in labels
column.
UPDATE: Double check your indexes.
CodePudding user response:
Use DataFrame.update
with Series.str.join
and Series.str.get_dummies
:
import ast
#if necessary
#df['labels'] = df['labels'].apply(ast.literal_eval)
df.update(df['labels'].str.join('|').str.get_dummies())
print (df)
A B C D labels
0 0 0 1 1 [D, C]
1 0 0 0 0 []
2 1 1 0 1 [A, B, D]
3 0 0 0 1 [D]
CodePudding user response:
Try this:
for idx, row in df.iterrows():
for elm in row['labels']:
if elm in df:
df[elm][idx] = 1
Here you iterate through all df rows and for every rows you set the columns present in the labels list to 1.