Adding and updating a pandas column based on conditions of other columns-CodePudding

So I have a dataframe of over 1 million rows

One column called 'activity', which has numbers from 1 - 12. I added a new empty column called 'label'

The column 'label' needs to be filled with 0 or 1, based on the values of the column 'activity'

So if activity is 1, 2, 3, 6, 7, 8 label will be 0, otherwise it will be 1

Here is what I am currently doing:

df = pd.read_csv('data.csv')
df['label'] = ''
for index, row in df.iterrows():
    if (row['activity'] == 1 or row['activity'] == 2 or row['activity'] == 3 or row['activity'] == 6 row['activity'] == 7 or row['activity'] == 8):
        df.loc[index, 'label'] == 0
    else:
        df.loc[index, 'label'] == 1
df.to_cvs('data.csv', index = False)

This is very inefficient, and takes too long to run. Is there any optimizations? Possible use of numpy arrays? And any way to make the code cleaner?

CodePudding user response：

Use numpy.where with Series.isin:

df['label'] = np.where(df['activity'].isin([1, 2, 3, 6, 7, 8]), 0, 1)

Or True, False mapping to 0, 1 by inverting mask:

df['label'] = (~df['activity'].isin([1, 2, 3, 6, 7, 8])).astype(int)