Home > Software engineering >  New column from another column if value, if not, take value from another column Python Pandas
New column from another column if value, if not, take value from another column Python Pandas

Time:10-08

Most efficient for a big dataset in pandas:

I would like to add a new column Z taking the value from X if there is a value, if not, I want to take the value from Y.

Another thing, it there a possibility to use ternary operations to add a new column Z based on, if column Y exist then column Y - column X, if not, then only X.

I'm looking for the most efficient way in both cases.

Thank you

CodePudding user response:

Use numpy.where:

np.random.seed(123)

N = 10000
df = pd.DataFrame({'X':np.random.choice([np.nan, 1], size=N),
                   'Y':np.random.choice([3,4,6], size=N)})

df['Z1'] = np.where(df['X'].isna(), df['Y'],df['X'])

if 'Y' in df.columns:
    df['Z2'] = np.where(df['X'] - df['Y'], df['Y'],df['X'])
else:
    df['Z2'] = df['X']

print (df)
        X  Y   Z1   Z2
0     NaN  6  6.0  6.0
1     1.0  4  1.0  4.0
2     NaN  6  6.0  6.0
3     NaN  3  3.0  3.0
4     NaN  3  3.0  3.0
  ... ..  ...  ...
9995  1.0  6  1.0  6.0
9996  1.0  6  1.0  6.0
9997  NaN  6  6.0  6.0
9998  1.0  4  1.0  4.0
9999  1.0  6  1.0  6.0

[10000 rows x 4 columns]
  • Related