Home > Enterprise >  Dataframe update code runs perfectly on a test dataframe but not on a larger dataframe
Dataframe update code runs perfectly on a test dataframe but not on a larger dataframe

Time:05-30

I am trying to update a dataframe and while the update code works perfectly fine in a test dataframe, it does not work on a bigger dataframe. I cannot seem to understand why.

selection_weights:
   country                        league   Win   DNB  O 1.5  U 4.5
0   Africa         Africa Cup of Nations  3.68  1.86    5.2   1.45
1   Africa     Africa Cup of Nations U17  2.07  1.50    3.3   1.45
2   Africa     Africa Cup of Nations U20  2.07  1.50    3.3   1.45
3   Africa     Africa Cup of Nations U23  2.07  1.50    3.3   1.45
4   Africa    African Championship Women  2.07  1.50    3.3   1.45
5   Africa  African Nations Championship  2.07  1.50    3.3   1.45
6   Africa  CAF African Championship U17  2.07  1.50    3.3   1.45
7   Africa  CAF African Championship U20  2.07  1.50    3.3   1.45
8   Africa          CAF Champions League  2.07  1.50    3.3   1.45
9   Africa         CAF Confederation Cup  2.07  1.50    3.3   1.45
10  Africa                 CAF Super Cup  2.07  1.50    3.3   1.45

selection_db:
   country                        league  Win  DNB  O 1.5  U 4.5
0   Africa         Africa Cup of Nations  1.1  0.7    3.2    2.2
1   Africa     Africa Cup of Nations U17  1.1  0.7    3.2    2.2
2   Africa     Africa Cup of Nations U20  1.1  0.7    3.2    2.2
3   Africa     Africa Cup of Nations U23  1.1  0.7    3.2    2.2
4   Africa    African Championship Women  1.1  0.7    3.2    2.2
5   Africa  African Nations Championship  1.1  0.7    3.2    2.2
6   Africa  CAF African Championship U17  1.1  0.7    3.2    2.2
7   Africa  CAF African Championship U20  1.1  0.7    3.2    2.2
8   Africa          CAF Champions League  1.1  0.7    3.2    2.2
9   Africa         CAF Confederation Cup  1.1  0.7    3.2    2.2
10  Africa                 CAF Super Cup  1.1  0.7    3.2    2.2
11  Africa           CECAFA Championship  1.1  0.7    3.2    2.2
12  Africa              CECAFA Clubs Cup  1.1  0.7    3.2    2.2
13  Africa       COSAFA Championship U20  1.1  0.7    3.2    2.2
14  Africa                    COSAFA Cup  1.1  0.7    3.2    2.2
15  Africa                Nile Basin Cup  1.1  0.7    3.2    2.2
16  Africa           WAFU Cup of Nations  1.1  0.7    3.2    2.2

ids = ['country', 'league']
selection_db.update(selection_db[ids].merge(selection_weights, how='left'))

print(selection_db)
   country                        league   Win   DNB  O 1.5  U 4.5
0   Africa         Africa Cup of Nations  3.68  1.86    5.2   1.45
1   Africa     Africa Cup of Nations U17  2.07  1.50    3.3   1.45
2   Africa     Africa Cup of Nations U20  2.07  1.50    3.3   1.45
3   Africa     Africa Cup of Nations U23  2.07  1.50    3.3   1.45
4   Africa    African Championship Women  2.07  1.50    3.3   1.45
5   Africa  African Nations Championship  2.07  1.50    3.3   1.45
6   Africa  CAF African Championship U17  2.07  1.50    3.3   1.45
7   Africa  CAF African Championship U20  2.07  1.50    3.3   1.45
8   Africa          CAF Champions League  2.07  1.50    3.3   1.45
9   Africa         CAF Confederation Cup  2.07  1.50    3.3   1.45
10  Africa                 CAF Super Cup  2.07  1.50    3.3   1.45
11  Africa           CECAFA Championship  1.10  0.70    3.2   2.20
12  Africa              CECAFA Clubs Cup  1.10  0.70    3.2   2.20
13  Africa       COSAFA Championship U20  1.10  0.70    3.2   2.20
14  Africa                    COSAFA Cup  1.10  0.70    3.2   2.20
15  Africa                Nile Basin Cup  1.10  0.70    3.2   2.20
16  Africa           WAFU Cup of Nations  1.10  0.70    3.2   2.20

When I change the datframes to much bigger ones (or even df.head()) as below:

selection_weights = selection_weights.head(10)
print(selection_weights)
  country                        league   Win   DNB  O 1.5  U 4.5
0  Africa         Africa Cup of Nations  3.68  1.86    5.2   1.45
1  Africa     Africa Cup of Nations U17  2.07  1.50    3.3   1.45
2  Africa     Africa Cup of Nations U20  2.07  1.50    3.3   1.45
3  Africa     Africa Cup of Nations U23  2.07  1.50    3.3   1.45
4  Africa    African Championship Women  2.07  1.50    3.3   1.45
5  Africa  African Nations Championship  2.07  1.50    3.3   1.45
6  Africa  CAF African Championship U17  2.07  1.50    3.3   1.45
7  Africa  CAF African Championship U20  2.07  1.50    3.3   1.45
8  Africa          CAF Champions League  2.07  1.50    3.3   1.45
9  Africa         CAF Confederation Cup  2.07  1.50    3.3   1.45

selection_db = selection_db.head(15)
print(selection_db)
       country                        league  Win  DNB  O 1.5  U 4.5
140149  Africa         Africa Cup of Nations  1.1  0.7    3.2    2.2
887344  Africa     Africa Cup of Nations U17  1.1  0.7    3.2    2.2
139868  Africa     Africa Cup of Nations U20  1.1  0.7    3.2    2.2
142111  Africa     Africa Cup of Nations U23  1.1  0.7    3.2    2.2
140735  Africa    African Championship Women  1.1  0.7    3.2    2.2
140013  Africa  African Nations Championship  1.1  0.7    3.2    2.2
140352  Africa  CAF African Championship U17  1.1  0.7    3.2    2.2
142365  Africa  CAF African Championship U20  1.1  0.7    3.2    2.2
139831  Africa          CAF Champions League  1.1  0.7    3.2    2.2
139738  Africa         CAF Confederation Cup  1.1  0.7    3.2    2.2
934878  Africa                 CAF Super Cup  1.1  0.7    3.2    2.2
140675  Africa           CECAFA Championship  1.1  0.7    3.2    2.2
141533  Africa              CECAFA Clubs Cup  1.1  0.7    3.2    2.2
143054  Africa       COSAFA Championship U20  1.1  0.7    3.2    2.2
139846  Africa                    COSAFA Cup  1.1  0.7    3.2    2.2

ids = ['country', 'league']
selection_db.update(selection_db[ids].merge(selection_weights, how='left'))
print(selection_db)
       country                        league  Win  DNB  O 1.5  U 4.5
140149  Africa         Africa Cup of Nations  1.1  0.7    3.2    2.2
887344  Africa     Africa Cup of Nations U17  1.1  0.7    3.2    2.2
139868  Africa     Africa Cup of Nations U20  1.1  0.7    3.2    2.2
142111  Africa     Africa Cup of Nations U23  1.1  0.7    3.2    2.2
140735  Africa    African Championship Women  1.1  0.7    3.2    2.2
140013  Africa  African Nations Championship  1.1  0.7    3.2    2.2
140352  Africa  CAF African Championship U17  1.1  0.7    3.2    2.2
142365  Africa  CAF African Championship U20  1.1  0.7    3.2    2.2
139831  Africa          CAF Champions League  1.1  0.7    3.2    2.2
139738  Africa         CAF Confederation Cup  1.1  0.7    3.2    2.2
934878  Africa                 CAF Super Cup  1.1  0.7    3.2    2.2
140675  Africa           CECAFA Championship  1.1  0.7    3.2    2.2
141533  Africa              CECAFA Clubs Cup  1.1  0.7    3.2    2.2
143054  Africa       COSAFA Championship U20  1.1  0.7    3.2    2.2
139846  Africa                    COSAFA Cup  1.1  0.7    3.2    2.2

Why is this happening?

CodePudding user response:

Possible cause of the problem

DataFrame.update internally relies on matching indices(both columns and rows) to update the corresponding values.

Now in your small dataframe the merge ids doesn't seem to have duplicates hence the resulting merged dataframe has indices similar to selection_db. But in your large dataframe there might be duplicates in selection_weights which after the merge is producing a even larger dataframe which doesn't necessarily has the matching indices with your selection_db.

Solution (merge not required)

selection_db = selection_db.set_index(ids)
selection_db.update(selection_weights.drop_duplicates(ids).set_index(ids))
selection_db = selection_db.reset_index()
  • Related