I need to assign a 'full' dataframe to a part of another bigger dataframe based on some conditions.
So, I have two dataframes, the first lets say:
import pandas as pd
df_1 = pd.DataFrame({
'A': [0, 0, 1, 1, 2, 2],
'B': [1, 2, 3, 4, 5, 6],
'C': ['a', 'b', 'c', 'd', 'e', 'f']
})
and
df_2 = pd.DataFrame({
'A': [0, 0, 0],
'B': [5, 5, 6],
'C': ['z', 'z', 't']
})
What I want to do is something like:
df_1.loc[df_1.A == 0][[ 'B', 'C' ]] = df_2[['B', 'C']]
to get in df_1 the values of df_2. The result that I get is that the rows of df_1 wih A == 0 became NaN.
How can I fix this issue? Thanks for the answers.
CodePudding user response:
Your solution working with sample data, because indices matching between first 2 rows of df_1
and df_2
, what obviously in real data is not:
df_1.loc[df_1.A == 0, [ 'B', 'C' ]] = df_2[['B', 'C']]
print (df_1)
A B C
0 0 5 z
1 0 5 z
2 1 3 c
3 1 4 d
4 2 5 e
5 2 6 f
For general solution is changed indices, if use solution above get NaN
s.
You can filter by count True
s by sum
and assign numpy array:
df_1 = pd.DataFrame({
'A': [0, 0, 1, 1, 2, 2],
'B': [1, 2, 3, 4, 5, 6],
'C': ['a', 'b', 'c', 'd', 'e', 'f']
}, index=list('efghik'))
m = df_1.A == 0
df_1.loc[m, [ 'B', 'C' ]] = df_2[['B', 'C']].iloc[:m.sum()].to_numpy()
print (df_1)
A B C
e 0 5 z
f 0 5 z
g 1 3 c
h 1 4 d
i 2 5 e
k 2 6 f
Another idea is rename indices for matching:
m = df_1.A == 0
df_1.loc[m, [ 'B', 'C' ]] = df_2[['B', 'C']].rename(dict(zip(df_2.index, df_1.index[m])))
print (df_1)
A B C
e 0 5 z
f 0 5 z
g 1 3 c
h 1 4 d
i 2 5 e
k 2 6 f