Replace multiple column values if a value is the same in both data frames-CodePudding

    0   1   2   3   4   5   6   7   8   9
0   1   Биир    биир    NUM num NumType=Card    _   _   _   _
1   2   паартаҕа    паарта  NOUN    n   Case=Dat|Number=Sing    _   _   _   _
2   3   киһи    киһи    NOUN    n   Case=Nom|Number=Sing    _   _   _   _
3   4   олорор  олор    VERB    v   Person=3|Tense=Pres _   _   _   _
4   5   .   .   PUNCT   punct   _   _   _   _   _

    0   1   2   3   4   5   6   7   8   9
0   1   Биир    _   _   _   _   _   _   _   _
1   2   уол _   _   _   _   _   _   _   _
2   3   турар   _   _   _   _   _   _   _   _
3   4   уонна   _   _   _   _   _   _   _   _
4   5   ааҕар   _   _   _   _   _   _   _   _
5   6   .   _   _   _   _   _   _   _   _

How do I replace specific columns if a value from the second df is in the first one?

df2[1].isin(df1[1])

0     True
1    False
2    False
3    False
4    False
5     True

For all True, replace columns 2,3,4,5. The output should be this:

    0   1   2   3   4   5   6   7   8   9
0   1   Биир    биир    NUM num NumType=Card    _   _   _   _
1   2   уол _   _   _   _   _   _   _   _
2   3   турар   _   _   _   _   _   _   _   _
3   4   уонна   _   _   _   _   _   _   _   _
4   5   ааҕар   _   _   _   _   _   _   _   _
5   6   .   .   PUNCT   punct   _   _   _   _   _

I tried using where but it gives me an error that the length of 2 dfs is different.

df2[[2, 3, 4, 5]].where(df2[1].isin(df1[1]), df1[[2, 3, 4, 5]].values)

Is there any other way to replace multiple columns by a specific condition?

CodePudding user response：

one way is to: 1.concat, 2.drop_duplicates, 3.filter, 4.sort, here goes:

df = pd.concat([df2, df1]).drop_duplicates('1', keep='last')
df = df[df['1'].isin(df2['1'])].sort_values('0')

df:

	0	1	2	3	4	5	6	7	8	9
0	1	Биир	биир	NUM	num	NumType=Card	_	_	_	_
1	2	уол	_	_	_	_	_	_	_	_
2	3	турар	_	_	_	_	_	_	_	_
3	4	уонна	_	_	_	_	_	_	_	_
4	5	ааҕар	_	_	_	_	_	_	_	_
4	5	.	.	PUNCT	punct	_	_	_	_	_