Home > Back-end >  What cause the different times function if the dataframe has a columns name?
What cause the different times function if the dataframe has a columns name?

Time:12-03

I tested this two snippets,the df share the same structure other than the df.columns, so what causes the difference between them? And how should I change my second snippet, for example,should I always use the pandas.DataFrame.mul or use the other method to avoid this?

# test1
df = pd.DataFrame(np.random.randint(100, size=(10, 10))) \
    .assign(Count=np.random.rand(10))
df.iloc[:, 0:3] *= df['Count']
df
Out[1]:
           0          1          2   3   4   5   6   7   8   9     Count
0  26.484949  68.217006   4.902341  61  10  13  31  15  10  11  0.645974
1  56.845743  70.085965  28.106758  79  56  47  82  83  62  40  0.934480
2  33.590667  78.496281   1.634114  94   3  91  16  41  93  55  0.326823
3  51.031974  15.886152  26.145821  67  31  20  81  21  10   8  0.012706
4  47.156128  82.234199  10.458328  24   8  68  44  24   4  50  0.517130
5  18.733256  61.675649  23.531239  74  61  97  20  12   0  95  0.360815
6   4.521820  26.165427  26.145821  68  10  77  67  92  82  11  0.606739
7  24.547026  62.610129  23.531239  50  45  69  94  56  77  56  0.412445
8  52.969897  75.692843   9.804683  73  74   5  10  60  51  77  0.125309
9  21.963128  30.837825  19.609366  75   9  50  68  10  82  96  0.687966
#test2
df = pd.DataFrame(np.random.randint(100, size=(10, 10))) \
    .assign(Count=np.random.rand(10))
df.columns = ['find', 'a', 'b', 3, 4, 5, 6, 7, 8, 9, 'Count']
df.iloc[:, 0:3] *= df['Count']
df
Out[2]:
 find   a   b   3   4   5   6   7   8   9     Count
0   NaN NaN NaN  63  63  47  81   3  48  34  0.603953
1   NaN NaN NaN  70  48  41  27  78  75  23  0.839635
2   NaN NaN NaN   5  38  52  23   3  75   4  0.515159
3   NaN NaN NaN  40  49  31  25  63  48  25  0.483255
4   NaN NaN NaN  42  89  46  47  78  30   5  0.693555
5   NaN NaN NaN  68  83  81  87   7  54   3  0.108306
6   NaN NaN NaN  74  48  99  67  80  81  36  0.361500
7   NaN NaN NaN  10  19  26  41  11  24  33  0.705899
8   NaN NaN NaN  38  51  83  78   7  31  42  0.838703
9   NaN NaN NaN   2   7  63  14  28  38  10  0.277547

CodePudding user response:

df.iloc[:,0:3] is a dataframe with three series, named find, a, and b. df['Count'] is a series named Count. When you multiply these, Pandas tries to match up same-named series, but since there are none, it ends up generating NaN values for all the slots. Then it assigns these NaN:s back to the dataframe.

I think that using .mul with an appropriate axis= is the way around this, but I may be wrong about that...

  • Related