Home > Net >  Adding and updating a pandas column based on conditions of other columns
Adding and updating a pandas column based on conditions of other columns

Time:02-15

So I have a dataframe of over 1 million rows

One column called 'activity', which has numbers from 1 - 12. I added a new empty column called 'label'

The column 'label' needs to be filled with 0 or 1, based on the values of the column 'activity'

So if activity is 1, 2, 3, 6, 7, 8 label will be 0, otherwise it will be 1

Here is what I am currently doing:

df = pd.read_csv('data.csv')
df['label'] = ''
for index, row in df.iterrows():
    if (row['activity'] == 1 or row['activity'] == 2 or row['activity'] == 3 or row['activity'] == 6 row['activity'] == 7 or row['activity'] == 8):
        df.loc[index, 'label'] == 0
    else:
        df.loc[index, 'label'] == 1
df.to_cvs('data.csv', index = False)

This is very inefficient, and takes too long to run. Is there any optimizations? Possible use of numpy arrays? And any way to make the code cleaner?

CodePudding user response:

Use numpy.where with Series.isin:

df['label'] = np.where(df['activity'].isin([1, 2, 3, 6, 7, 8]), 0, 1)

Or True, False mapping to 0, 1 by inverting mask:

df['label'] = (~df['activity'].isin([1, 2, 3, 6, 7, 8])).astype(int)
  • Related