I am trying to update a dataframe and while the update code works perfectly fine in a test dataframe, it does not work on a bigger dataframe. I cannot seem to understand why.
selection_weights:
country league Win DNB O 1.5 U 4.5
0 Africa Africa Cup of Nations 3.68 1.86 5.2 1.45
1 Africa Africa Cup of Nations U17 2.07 1.50 3.3 1.45
2 Africa Africa Cup of Nations U20 2.07 1.50 3.3 1.45
3 Africa Africa Cup of Nations U23 2.07 1.50 3.3 1.45
4 Africa African Championship Women 2.07 1.50 3.3 1.45
5 Africa African Nations Championship 2.07 1.50 3.3 1.45
6 Africa CAF African Championship U17 2.07 1.50 3.3 1.45
7 Africa CAF African Championship U20 2.07 1.50 3.3 1.45
8 Africa CAF Champions League 2.07 1.50 3.3 1.45
9 Africa CAF Confederation Cup 2.07 1.50 3.3 1.45
10 Africa CAF Super Cup 2.07 1.50 3.3 1.45
selection_db:
country league Win DNB O 1.5 U 4.5
0 Africa Africa Cup of Nations 1.1 0.7 3.2 2.2
1 Africa Africa Cup of Nations U17 1.1 0.7 3.2 2.2
2 Africa Africa Cup of Nations U20 1.1 0.7 3.2 2.2
3 Africa Africa Cup of Nations U23 1.1 0.7 3.2 2.2
4 Africa African Championship Women 1.1 0.7 3.2 2.2
5 Africa African Nations Championship 1.1 0.7 3.2 2.2
6 Africa CAF African Championship U17 1.1 0.7 3.2 2.2
7 Africa CAF African Championship U20 1.1 0.7 3.2 2.2
8 Africa CAF Champions League 1.1 0.7 3.2 2.2
9 Africa CAF Confederation Cup 1.1 0.7 3.2 2.2
10 Africa CAF Super Cup 1.1 0.7 3.2 2.2
11 Africa CECAFA Championship 1.1 0.7 3.2 2.2
12 Africa CECAFA Clubs Cup 1.1 0.7 3.2 2.2
13 Africa COSAFA Championship U20 1.1 0.7 3.2 2.2
14 Africa COSAFA Cup 1.1 0.7 3.2 2.2
15 Africa Nile Basin Cup 1.1 0.7 3.2 2.2
16 Africa WAFU Cup of Nations 1.1 0.7 3.2 2.2
ids = ['country', 'league']
selection_db.update(selection_db[ids].merge(selection_weights, how='left'))
print(selection_db)
country league Win DNB O 1.5 U 4.5
0 Africa Africa Cup of Nations 3.68 1.86 5.2 1.45
1 Africa Africa Cup of Nations U17 2.07 1.50 3.3 1.45
2 Africa Africa Cup of Nations U20 2.07 1.50 3.3 1.45
3 Africa Africa Cup of Nations U23 2.07 1.50 3.3 1.45
4 Africa African Championship Women 2.07 1.50 3.3 1.45
5 Africa African Nations Championship 2.07 1.50 3.3 1.45
6 Africa CAF African Championship U17 2.07 1.50 3.3 1.45
7 Africa CAF African Championship U20 2.07 1.50 3.3 1.45
8 Africa CAF Champions League 2.07 1.50 3.3 1.45
9 Africa CAF Confederation Cup 2.07 1.50 3.3 1.45
10 Africa CAF Super Cup 2.07 1.50 3.3 1.45
11 Africa CECAFA Championship 1.10 0.70 3.2 2.20
12 Africa CECAFA Clubs Cup 1.10 0.70 3.2 2.20
13 Africa COSAFA Championship U20 1.10 0.70 3.2 2.20
14 Africa COSAFA Cup 1.10 0.70 3.2 2.20
15 Africa Nile Basin Cup 1.10 0.70 3.2 2.20
16 Africa WAFU Cup of Nations 1.10 0.70 3.2 2.20
When I change the datframes to much bigger ones (or even df.head()
) as below:
selection_weights = selection_weights.head(10)
print(selection_weights)
country league Win DNB O 1.5 U 4.5
0 Africa Africa Cup of Nations 3.68 1.86 5.2 1.45
1 Africa Africa Cup of Nations U17 2.07 1.50 3.3 1.45
2 Africa Africa Cup of Nations U20 2.07 1.50 3.3 1.45
3 Africa Africa Cup of Nations U23 2.07 1.50 3.3 1.45
4 Africa African Championship Women 2.07 1.50 3.3 1.45
5 Africa African Nations Championship 2.07 1.50 3.3 1.45
6 Africa CAF African Championship U17 2.07 1.50 3.3 1.45
7 Africa CAF African Championship U20 2.07 1.50 3.3 1.45
8 Africa CAF Champions League 2.07 1.50 3.3 1.45
9 Africa CAF Confederation Cup 2.07 1.50 3.3 1.45
selection_db = selection_db.head(15)
print(selection_db)
country league Win DNB O 1.5 U 4.5
140149 Africa Africa Cup of Nations 1.1 0.7 3.2 2.2
887344 Africa Africa Cup of Nations U17 1.1 0.7 3.2 2.2
139868 Africa Africa Cup of Nations U20 1.1 0.7 3.2 2.2
142111 Africa Africa Cup of Nations U23 1.1 0.7 3.2 2.2
140735 Africa African Championship Women 1.1 0.7 3.2 2.2
140013 Africa African Nations Championship 1.1 0.7 3.2 2.2
140352 Africa CAF African Championship U17 1.1 0.7 3.2 2.2
142365 Africa CAF African Championship U20 1.1 0.7 3.2 2.2
139831 Africa CAF Champions League 1.1 0.7 3.2 2.2
139738 Africa CAF Confederation Cup 1.1 0.7 3.2 2.2
934878 Africa CAF Super Cup 1.1 0.7 3.2 2.2
140675 Africa CECAFA Championship 1.1 0.7 3.2 2.2
141533 Africa CECAFA Clubs Cup 1.1 0.7 3.2 2.2
143054 Africa COSAFA Championship U20 1.1 0.7 3.2 2.2
139846 Africa COSAFA Cup 1.1 0.7 3.2 2.2
ids = ['country', 'league']
selection_db.update(selection_db[ids].merge(selection_weights, how='left'))
print(selection_db)
country league Win DNB O 1.5 U 4.5
140149 Africa Africa Cup of Nations 1.1 0.7 3.2 2.2
887344 Africa Africa Cup of Nations U17 1.1 0.7 3.2 2.2
139868 Africa Africa Cup of Nations U20 1.1 0.7 3.2 2.2
142111 Africa Africa Cup of Nations U23 1.1 0.7 3.2 2.2
140735 Africa African Championship Women 1.1 0.7 3.2 2.2
140013 Africa African Nations Championship 1.1 0.7 3.2 2.2
140352 Africa CAF African Championship U17 1.1 0.7 3.2 2.2
142365 Africa CAF African Championship U20 1.1 0.7 3.2 2.2
139831 Africa CAF Champions League 1.1 0.7 3.2 2.2
139738 Africa CAF Confederation Cup 1.1 0.7 3.2 2.2
934878 Africa CAF Super Cup 1.1 0.7 3.2 2.2
140675 Africa CECAFA Championship 1.1 0.7 3.2 2.2
141533 Africa CECAFA Clubs Cup 1.1 0.7 3.2 2.2
143054 Africa COSAFA Championship U20 1.1 0.7 3.2 2.2
139846 Africa COSAFA Cup 1.1 0.7 3.2 2.2
Why is this happening?
CodePudding user response:
Possible cause of the problem
DataFrame.update
internally relies on matching indices(both columns and rows) to update the corresponding values.
Now in your small dataframe the merge ids
doesn't seem to have duplicates hence the resulting merged dataframe has indices similar to selection_db
. But in your large dataframe there might be duplicates in selection_weights
which after the merge is producing a even larger dataframe which doesn't necessarily has the matching indices with your selection_db
.
Solution (merge
not required)
selection_db = selection_db.set_index(ids)
selection_db.update(selection_weights.drop_duplicates(ids).set_index(ids))
selection_db = selection_db.reset_index()