I have a DataFrame which consists of 30 rows and 9 columns. I want to make a 2 sigma outlier removal.
I do it with this:
from scipy import stats
df[(np.abs(stats.zscore(df)) < 2).all(axis=1)]
But it removes the whole line if there is a outlier in a single column. I just want to get this single value deleted. How can I do this? And the first column contains the time. this should never be touched. How can I exclude this single column?
this is how the data looks like:
Trace for Mass: 60Ni 61Ni 62Ni 63Cu 64Ni 65Cu 66Zn
Resolution: High High High High High High High
Time Intensity Intensity Intensity Intensity Intensity Intensity Intensity
[sec] [cps] [cps] [cps] [cps] [cps] [cps] [cps]
0. 4.246875178068876e-003 4.550645244307816e-004 8.364085806533694e-004 3.21496045216918e-003 3.215973265469074e-003 1.595904817804694e-003 1.983924303203821e-003
1.051999807357788 4.264393821358681e-003 5.171436932869256e-004 8.292743586935103e-004 3.154967911541462e-003 3.216561861336231e-003 1.622977200895548e-003 1.874359208159149e-003
2.102999925613403 4.27544629201293e-003 4.796394787263125e-004 8.318902109749615e-004 3.211528761312366e-003 3.147452371194959e-003 1.622740761376917e-003 1.879810937680304e-003
3.154999971389771 4.278738517314196e-003 4.829006502404809e-004 7.972901221364737e-004 3.218628698959947e-003 3.22998408228159e-003 1.604416524060071e-003 1.938240835443139e-003
4.206999778747559 4.211603198200464e-003 4.424861108418554e-004 8.007381693460047e-004 3.2428870908916e-003 3.166524693369865e-003 1.590821426361799e-003 1.903632888570428e-003
5.257999897003174 4.267803858965635e-003 5.1306706154719e-004 8.309389813803136e-004 3.144200425595045e-003 3.117314074188471e-003 1.603707205504179e-003 1.815222087316215e-003
6.309999942779541 4.182798787951469e-003 5.052632768638432e-004 7.896805764175952e-004 3.130593337118626e-003 3.10095027089119e-003 1.570251770317555e-003 1.817710697650909e-003
7.361000061035156 4.296375438570976e-003 4.910536226816475e-004 8.9122453937307e-004 3.204192267730832e-003 3.028199542313814e-003 1.533132861368358e-003 1.788084045983851e-003
8.413000106811523 4.335530567914248e-003 6.025235052220523e-004 8.631621603854001e-004 3.268211148679256e-003 2.987353131175041e-003 1.608435995876789e-003 1.796260941773653e-003
9.463999748229981 4.290143493562937e-003 4.839488829020411e-004 8.525795419700444e-004 3.222533734515309e-003 3.005951410159469e-003 1.583610195666552e-003 1.700276043266058e-003
10.51599979400635 4.287909716367722e-003 5.497571546584368e-004 9.083477198146284e-004 3.219338599592447e-003 2.950039459392428e-003 1.682562520727515e-003 1.783343963325024e-003
11.56699943542481 4.260278772562742e-003 4.665948799811304e-004 7.738673011772335e-004 3.193542594090104e-003 2.853760728612542e-003 1.568833249621093e-003 1.736654434353113e-003
12.61899948120117 4.26474679261446e-003 5.00720867421478e-004 8.611407829448581e-004 3.217800287529826e-003 2.865647897124291e-003 1.595077337697148e-003 1.658685388974845e-003
13.67099952697754 4.222772549837828e-003 4.647313617169857e-004 8.633999968878925e-004 3.159464336931706e-003 2.801976399496198e-003 1.629361184313893e-003 1.673259655945003e-003
14.72200012207031 4.23405971378088e-003 4.880253691226244e-004 8.320091292262077e-004 3.10550956055522e-003 2.766199875622988e-003 1.57923623919487e-003 1.671363832429051e-003
15.77400016784668 4.263806156814098e-003 5.268111126497388e-004 8.335548918694258e-004 3.150589996948838e-003 2.747958991676569e-003 1.52225757483393e-003 1.638660905882716e-003
16.82500076293945 4.173276014626026e-003 5.153965321369469e-004 7.848058012314141e-004 3.132368205115199e-003 2.736426191404462e-003 1.501098275184631e-003 1.646955031901598e-003
17.87699890136719 4.209604579955339e-003 4.582091642078012e-004 7.977656787261367e-004 3.183129709213972e-003 2.714420203119516e-003 1.604771241545677e-003 1.606788486242294e-003
18.92900085449219 4.214542452245951e-003 4.919854109175503e-004 8.5032032802701e-004 3.177686594426632e-003 2.588512841612101e-003 1.560558215714991e-003 1.607973361387849e-003
19.97999954223633 4.171629901975393e-003 4.438837058842182e-004 8.449696470052004e-004 3.142070723697543e-003 2.649111207574606e-003 1.58833886962384e-003 1.547667197883129e-003
21.0310001373291 4.234999883919954e-003 5.094563821330667e-004 8.215457201004028e-004 3.189756069332361e-003 2.645698608830571e-003 1.556538976728916e-003 1.515797688625753e-003
22.08300018310547 4.159520845860243e-003 5.21336798556149e-004 7.7945546945557e-004 3.093914361670613e-003 2.504269825294614e-003 1.597914495505393e-003 1.550629152916372e-003
23.13399887084961 4.095097538083792e-003 5.284418002702296e-004 8.160762954503298e-004 3.164552384987474e-003 2.605574205517769e-003 1.5143376076594e-003 1.545534702017903e-003
24.18600082397461 4.190911073237658e-003 4.741653683595359e-004 8.253505802713335e-004 3.078178269788623e-003 2.457562601193786e-003 1.61718437448144e-003 1.502647297456861e-003
25.23799896240234 4.155758768320084e-003 4.477270995266736e-004 8.012137841433287e-004 3.119352972134948e-003 2.549331868067384e-003 1.551455701701343e-003 1.538307638838887e-003
26.28899955749512 4.055834375321865e-003 4.267746699042618e-004 8.247561054304242e-004 3.050019731745124e-003 2.364743268117309e-003 1.565523212775588e-003 1.418655156157911e-003
27.34099960327148 4.160813987255096e-003 4.637996316887438e-004 8.405701955780387e-004 3.15011665225029e-003 2.621341263875365e-003 1.558548538014293e-003 1.534871873445809e-003
28.39200019836426 4.123781807720661e-003 5.418366636149585e-004 8.308201213367283e-004 3.128936979919672e-003 2.427210099995136e-003 1.607372076250613e-003 1.475754892453551e-003
29.44400024414063 4.185620695352554e-003 4.987408174201846e-004 7.421225891448557e-004 3.080426249653101e-003 2.371448557823896e-003 1.567532890476286e-003 1.444243011064827e-003
30.49600028991699 4.092158749699593e-003 5.319360643625259e-004 8.368841372430325e-004 3.113200422376394e-003 2.385094529017806e-003 1.580300158821046e-003 1.433581346645951e-003
This file is read by:
pd.options.display.float_format = '{:.4f}'.format
data = pd.read_csv(dateiname, sep='\t', names=['Time', '60Ni', '61Ni', '62Ni', '63Cu', '64Ni', '65Cu', '66Zn'], skiprows=6, nrows=30, index_col=False, dtype=float)
CodePudding user response:
If need replace outliers by missing values use DataFrame.mask
:
df = df.mask(np.abs(stats.zscore(df)) < 2)
#working for replace outlier by missing values
#df = df.mask(np.abs(stats.zscore(df)) < 2, np.nan)
I just want to get this single value deleted.
It is not possible, only remove row(s) like your solution.
CodePudding user response:
It would be better to provide your data, but IIUC, use mask
to mask your outliers with NaN
:
from scipy import stats
cols = list(df.drop(columns='Time').columns)
# or
# cols = ['60Ni', '61Ni', '62Ni', '63Cu', '64Ni', '65Cu', '66Zn']
df[cols] = df[cols].mask(np.abs(stats.zscore(df[cols])) >= 2)
or with where
from scipy import stats
cols = list(df.drop(columns='Time').columns)
# or
# cols = ['60Ni', '61Ni', '62Ni', '63Cu', '64Ni', '65Cu', '66Zn']
df[cols] = df[cols].where(np.abs(stats.zscore(df[cols])) < 2)
output:
Time 60Ni 61Ni 62Ni 63Cu 64Ni 65Cu 66Zn
0 0.000000 0.004247 0.000455 0.000836 0.003215 0.003216 0.001596 0.001984
1 1.052000 0.004264 0.000517 0.000829 0.003155 0.003217 0.001623 0.001874
2 2.103000 0.004275 0.000480 0.000832 0.003212 0.003147 0.001623 0.001880
3 3.155000 0.004279 0.000483 0.000797 0.003219 0.003230 0.001604 0.001938
4 4.207000 0.004212 0.000442 0.000801 0.003243 0.003167 0.001591 0.001904
5 5.258000 0.004268 0.000513 0.000831 0.003144 0.003117 0.001604 0.001815
6 6.310000 0.004183 0.000505 0.000790 0.003131 0.003101 0.001570 0.001818
7 7.361000 0.004296 0.000491 0.000891 0.003204 0.003028 0.001533 0.001788
8 8.413000 0.004336 NaN 0.000863 NaN 0.002987 0.001608 0.001796
9 9.464000 0.004290 0.000484 0.000853 0.003223 0.003006 0.001584 0.001700
10 10.516000 0.004288 0.000550 NaN 0.003219 0.002950 NaN 0.001783
11 11.566999 0.004260 0.000467 0.000774 0.003194 0.002854 0.001569 0.001737
12 12.618999 0.004265 0.000501 0.000861 0.003218 0.002866 0.001595 0.001659
13 13.671000 0.004223 0.000465 0.000863 0.003159 0.002802 0.001629 0.001673
14 14.722000 0.004234 0.000488 0.000832 0.003106 0.002766 0.001579 0.001671
15 15.774000 0.004264 0.000527 0.000834 0.003151 0.002748 0.001522 0.001639
16 16.825001 0.004173 0.000515 0.000785 0.003132 0.002736 NaN 0.001647
17 17.876999 0.004210 0.000458 0.000798 0.003183 0.002714 0.001605 0.001607
18 18.929001 0.004215 0.000492 0.000850 0.003178 0.002589 0.001561 0.001608
19 19.980000 0.004172 0.000444 0.000845 0.003142 0.002649 0.001588 0.001548
20 21.031000 0.004235 0.000509 0.000822 0.003190 0.002646 0.001557 0.001516
21 22.083000 0.004160 0.000521 0.000779 0.003094 0.002504 0.001598 0.001551
22 23.133999 0.004095 0.000528 0.000816 0.003165 0.002606 0.001514 0.001546
23 24.186001 0.004191 0.000474 0.000825 0.003078 0.002458 0.001617 0.001503
24 25.237999 0.004156 0.000448 0.000801 0.003119 0.002549 0.001551 0.001538
25 26.289000 NaN 0.000427 0.000825 NaN 0.002365 0.001566 0.001419
26 27.341000 0.004161 0.000464 0.000841 0.003150 0.002621 0.001559 0.001535
27 28.392000 0.004124 0.000542 0.000831 0.003129 0.002427 0.001607 0.001476
28 29.444000 0.004186 0.000499 NaN 0.003080 0.002371 0.001568 0.001444
29 30.496000 0.004092 0.000532 0.000837 0.003113 0.002385 0.001580 0.001434