I have two columns and I'm trying to make a new one depending if one has a null value or not. I have the following example and I´m trying to use the np.where() function but it doesn´t seems to work.
import pandas as pd
import numpy as np
# DF 1
a = pd.DataFrame([1,'nan',2],columns=['a1'])
# DF 2
b = pd.DataFrame(['hola','hola','hola'],columns=['b1'])
# New Column
b['b2'] = np.where(a['a1'].astype(str) != 'nan', b['b1'] a['a1'].astype(str))
The result for the new column 'b2' should be:
hola1
hola
hola2
The np.where
function also doesn´t has like an else option so I don´t know how to include that. I appreciate the help!
CodePudding user response:
You need to "synchronize" a
and b
dataframes to make pairwise comparisons (possibly with pd.concat
):
b['b2'] = pd.concat([a.replace({'nan': ''}), b], axis=1).apply(lambda x:x['b1'] str(x['a1']), axis=1)
print(b)
b1 b2
0 hola hola1
1 hola hola
2 hola hola2
CodePudding user response:
This is an alternative solution using list comprehension:
b['b2'] = [str(j) if i=='nan' else str(j) str(i) for i, j in zip(a.a1,b.b1) ]
print(b)
b1 b2
0 hola hola1
1 hola hola
2 hola hola2
I have used zip to loop both columns together.
CodePudding user response:
You can do a simple string concatenation:
>>> b['b1'] a['a1'].astype(str).replace('nan', '')
0 hola1
1 hola
2 hola2
dtype: object
CodePudding user response:
@RomanPerekhrest's solution is fine and you should upvote it. There is a vectorial solution if you like to try.
df = pd.concat([a, b], axis=1)
df["b2"] = df["b1"] df["a1"].astype("str")\
.str.replace("nan", "")
which returns
a1 b1 b2
0 1 hola hola1
1 nan hola hola
2 2 hola hola2
In case you really want to use np.where
you can try
df["b3"] = np.where(
df['a1'].eq('nan'),
df['b1'],
df['b1'] df['a1'].astype(str))
which leads to
a1 b1 b2 b3
0 1 hola hola1 hola1
1 nan hola hola hola
2 2 hola hola2 hola2
EDIT - Timing
Here a small experiment when we assume a, and b have 30.000 rows.
Data
import pandas as pd
n = 10_000
a = pd.DataFrame([1,'nan',2],columns=['a1'])
# DF 2
b = pd.DataFrame(['hola','hola','hola'],columns=['b1'])
a = pd.concat([a for i in range(n)],
ignore_index=True)
b = pd.concat([b for i in range(n)],
ignore_index=True)
a_bk = a.copy()
b_bk = b.copy()
@RomanPerekhrest's solution
%%timeit -n 10 -r 3
a = a_bk.copy()
b = b_bk.copy()
b['b2'] = pd.concat(
[a.replace({'nan': ''}),
b],
axis=1)\
.apply(lambda x:x['b1'] str(x['a1']), axis=1)
305 ms ± 17.8 ms per loop (mean ± std. dev. of 3 runs, 10 loops each)
Vectorial solution
%%timeit -n 10 -r 3
a = a_bk.copy()
b = b_bk.copy()
df = pd.concat([a, b], axis=1)
df["b2"] = df["b1"] df["a1"].astype("str")\
.str.replace("nan", "")
33.6 ms ± 4.34 ms per loop (mean ± std. dev. of 3 runs, 10 loops each)
Conclusion
Vectorial solution is 9x faster than using an apply.