I want to make a new column with this condition:
- If the the value on Case Number column in current row equals with the previous row, then the value should be taken from column 'diff'
- If the current row is not equal with the previous row, then the value should be taken from 'lastmod-start' column.
This is the code that I've tried :
df['new'] = [None] * len(df)
for i in range(1,len(df)):
if df['Case Number'][i] == df['Case Number'][i-1]:
df['new'][i] = df['diff']
else:
df['new'][i] = df['lastmod-start']
However the code above is resulting an error. Is the anyone can help me? Thank you.
This is the screenshot The result
CodePudding user response:
First, you can assign a new column as a single value and pandas will broadcast it to the entire column. Since the 'new'
column will be string, you can just initialize it to an empty string.
df['new'] = ''
Next, if you want to compare each row to the row before it, you can use the .shift()
method create a boolean index of which rows match. Then use that index to assign the values.
ix = df['Case Number'] == df['Case Number'].shift()
df.loc[ix, 'new'] = 'diff'
df.loc[~ix, 'new'] = 'lastmod-start'
CodePudding user response:
We can use np.where
along with shift()
for a one line vectorized solution here:
df["new"] = np.where(df["Case Number"] == df["Case Number"].shift(1), df["diff"], df["lastmod-start"])