How to make a new column with a specific condition?-CodePudding

I want to make a new column with this condition:

If the the value on Case Number column in current row equals with the previous row, then the value should be taken from column 'diff'
If the current row is not equal with the previous row, then the value should be taken from 'lastmod-start' column.

This is the code that I've tried :

df['new'] = [None] * len(df)
for i in range(1,len(df)):
    if df['Case Number'][i] == df['Case Number'][i-1]:
        df['new'][i] = df['diff']
    else:
        df['new'][i] = df['lastmod-start']

However the code above is resulting an error. Is the anyone can help me? Thank you.

This is the screenshot The result

CodePudding user response：

First, you can assign a new column as a single value and pandas will broadcast it to the entire column. Since the 'new' column will be string, you can just initialize it to an empty string.

df['new'] = ''

Next, if you want to compare each row to the row before it, you can use the .shift() method create a boolean index of which rows match. Then use that index to assign the values.

ix = df['Case Number'] == df['Case Number'].shift()
df.loc[ix, 'new'] = 'diff'
df.loc[~ix, 'new'] = 'lastmod-start'

CodePudding user response：

We can use np.where along with shift() for a one line vectorized solution here:

df["new"] = np.where(df["Case Number"] == df["Case Number"].shift(1), df["diff"], df["lastmod-start"])