I have this very simple Python Pandas DataFrame calles "sa"
>sa.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 searchAppearance 7 non-null object
1 clicks 7 non-null int64
2 impressions 7 non-null int64
3 ctr 7 non-null float64
4 position 7 non-null float64
dtypes: float64(2), int64(2), object(1)
memory usage: 408.0 bytes
with these values
>print(sa)
searchAppearance clicks impressions ctr position
0 AMP_TOP_STORIES 376 376 0.022917 8.108978
1 AMP_BLUE_LINK 55670 55670 0.051522 13.158574
2 PAGE_EXPERIENCE 68446 68446 0.039298 20.056293
3 RECIPE_FEATURE 40175 40175 0.042920 4.186674
4 RECIPE_RICH_SNIPPET 37428 37428 0.069153 18.726152
5 VIDEO 72 72 0.025361 15.896090
6 WEBLITE 1 1 0.001055 51.493671
all is good there.
now I do
sa['ctr-test']=devices['ctr']
this leads to
>print(sa)
searchAppearance clicks impressions ctr position ctr-test
0 AMP_TOP_STORIES 376 376 0.022917 8.108978 0.039522
1 AMP_BLUE_LINK 55670 55670 0.051522 13.158574 0.026543
2 PAGE_EXPERIENCE 68446 68446 0.039298 20.056293 0.051098
3 RECIPE_FEATURE 40175 40175 0.042920 4.186674 NaN
4 RECIPE_RICH_SNIPPET 37428 37428 0.069153 18.726152 NaN
5 VIDEO 72 72 0.025361 15.896090 NaN
6 WEBLITE 1 1 0.001055 51.493671 NaN
do you see all the NaN? but only starting from the 3rd row? it does not make any sense to me. the dataframe info still looks good
sa.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 searchAppearance 7 non-null object
1 clicks 7 non-null int64
2 impressions 7 non-null int64
3 ctr 7 non-null float64
4 position 7 non-null float64
5 ctr-test 3 non-null float64
dtypes: float64(3), int64(2), object(1)
memory usage: 464.0 bytes
i don't get it. what is going wrong? I am using Google Collaborate.
Seems like a bug, but not in my code? Any idea on how to debug this? (If it's not in my code.)
CodePudding user response:
The output of sa.info()
includes this line:
5 ctr-test 3 non-null float64
It seems that these three non-null values end up in the first three rows of sa
.
CodePudding user response:
sa['ctr-test']=devices['ctr']
stupidity on my side, mix up in dataframes / variables