Home > front end >  Copying a Pandas DataFrame Columns leads to Nan
Copying a Pandas DataFrame Columns leads to Nan

Time:04-22

I have this very simple Python Pandas DataFrame calles "sa"

>sa.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 5 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   searchAppearance  7 non-null      object 
 1   clicks            7 non-null      int64  
 2   impressions       7 non-null      int64  
 3   ctr               7 non-null      float64
 4   position          7 non-null      float64
dtypes: float64(2), int64(2), object(1)
memory usage: 408.0  bytes

with these values

>print(sa)
          searchAppearance  clicks  impressions       ctr   position
0      AMP_TOP_STORIES     376          376  0.022917   8.108978
1        AMP_BLUE_LINK   55670        55670  0.051522  13.158574
2      PAGE_EXPERIENCE   68446        68446  0.039298  20.056293
3       RECIPE_FEATURE   40175        40175  0.042920   4.186674
4  RECIPE_RICH_SNIPPET   37428        37428  0.069153  18.726152
5                VIDEO      72           72  0.025361  15.896090
6              WEBLITE       1            1  0.001055  51.493671

all is good there.

now I do

sa['ctr-test']=devices['ctr']

this leads to

>print(sa)
          searchAppearance  clicks  impressions       ctr   position  ctr-test
0      AMP_TOP_STORIES     376          376  0.022917   8.108978  0.039522
1        AMP_BLUE_LINK   55670        55670  0.051522  13.158574  0.026543
2      PAGE_EXPERIENCE   68446        68446  0.039298  20.056293  0.051098
3       RECIPE_FEATURE   40175        40175  0.042920   4.186674       NaN
4  RECIPE_RICH_SNIPPET   37428        37428  0.069153  18.726152       NaN
5                VIDEO      72           72  0.025361  15.896090       NaN
6              WEBLITE       1            1  0.001055  51.493671       NaN

do you see all the NaN? but only starting from the 3rd row? it does not make any sense to me. the dataframe info still looks good

sa.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 6 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   searchAppearance  7 non-null      object 
 1   clicks            7 non-null      int64  
 2   impressions       7 non-null      int64  
 3   ctr               7 non-null      float64
 4   position          7 non-null      float64
 5   ctr-test          3 non-null      float64
dtypes: float64(3), int64(2), object(1)
memory usage: 464.0  bytes

i don't get it. what is going wrong? I am using Google Collaborate.

Seems like a bug, but not in my code? Any idea on how to debug this? (If it's not in my code.)

CodePudding user response:

The output of sa.info() includes this line:

5   ctr-test          3 non-null      float64

It seems that these three non-null values end up in the first three rows of sa.

CodePudding user response:

sa['ctr-test']=devices['ctr']

stupidity on my side, mix up in dataframes / variables

  • Related