I want to selectively change column values to np.nan.
I have a column with a lot of zero (0) values.
I am getting the row indices of a subset of the total.
I place the indices into a variable (s0).
I then use this to set the column value to np.nan for just the rows whose index is in s0.
It runs, but it is changing every single row (i.e., the entire column) to np.nan.
Here is my code:
print((df3['amount_tsh'] == 0).sum()) # 41639 <-- there are this many zeros to start
# print(df3['amount_tsh'].value_counts()[0])
s0 = df3['amount_tsh'][df3['amount_tsh'].eq(0)].sample(37322).index # grab 37322 row indexes
print(len(s0)) # 37322
df3['amount_tsh'] = df3.loc[df3.index.isin(s0), 'amount_tsh'] = np.nan # change the value in the column to np.nan if it's index is in s0
print(df3['amount_tsh'].isnull().sum())
CodePudding user response:
Lets try
s0 = df3.loc[df3['amount_tsh'].eq(0), ['amount_tsh']].sample(37322)
df3.loc[df3.index.isin(s0.index), 'amount_tsh'] = np.nan
For a quick fix I used this data I had in a notebook and it worked for me
import pandas as pd
import numpy as np
data = pd.DataFrame({'Symbol': {0: 'ABNB', 1: 'DKNG', 2: 'EXPE', 3: 'MPNGF', 4: 'RDFN', 5: 'ROKU', 6: 'VIACA', 7: 'Z'},
'Number of Buys': {0: np.nan, 1: 2.0, 2: np.nan, 3: 1.0, 4: 2.0, 5: 1.0, 6: 1.0, 7: np.nan},
'Number of Sell s': {0: 1.0, 1: np.nan, 2: 1.0, 3: np.nan, 4: np.nan, 5: np.nan, 6: np.nan, 7: 1.0},
'Gains/Losses': {0: 2106.0, 1: -1479.2, 2: 1863.18, 3: -1980.0, 4: -1687.7, 5: -1520.52, 6: -1282.4, 7: 1624.59}, 'Percentage change': {0: 0.0, 1: 2.0, 2: 0.0, 3: 0.0, 4: 1.5, 5: 0.0, 6: 0.0, 7: 0.0}})
rows = ['ABNB','DKNG','EXPE']
data
Symbol Number of Buys Number of Sell s Gains/Losses \
0 ABNB NaN 1.0 2106.00
1 DKNG 2.0 NaN -1479.20
2 EXPE NaN 1.0 1863.18
3 MPNGF 1.0 NaN -1980.00
4 RDFN 2.0 NaN -1687.70
5 ROKU 1.0 NaN -1520.52
6 VIACA 1.0 NaN -1282.40
7 Z NaN 1.0 1624.59
Percentage change
0 0.0
1 2.0
2 0.0
3 0.0
4 1.5
5 0.0
6 0.0
7 0.0
By your approach
(data['Number of Buys']==1.0).sum()
s0= data.loc[(data['Number of Buys']==1.0),['Number of Buys']].sample(2)
data.loc[data.index.isin(s0.index),'Number of Buys'] =np.nan
Symbol Number of Buys Number of Sell s Gains/Losses \
0 ABNB NaN 1.0 2106.00
1 DKNG 2.0 NaN -1479.20
2 EXPE NaN 1.0 1863.18
3 MPNGF 1.0 NaN -1980.00
4 RDFN 2.0 NaN -1687.70
5 ROKU NaN NaN -1520.52
6 VIACA NaN NaN -1282.40
7 Z NaN 1.0 1624.59
Percentage change
0 0.0
1 2.0
2 0.0
3 0.0
4 1.5
5 0.0
6 0.0
7 0.0
CodePudding user response:
Hmm...
I removed the re-assignment and it worked??
s0 = df3['amount_tsh'][df3['amount_tsh'].eq(0)].sample(37322).index df3.loc[df3.index.isin(s0), 'amount_tsh'] = np.nan
The second line was: df3['amount_tsh'] = df3.loc[df3.index.isin(s0), 'amount_tsh'] = np.nan