so I have a data frame in which I want to create a threshold, meaning, for any value lower than 0.119 I want to replace it to "NA" meaning I want to treat it like it is nothing. when I type the code:
import pandas as pd
df = pd.read_csv("ups_core.csv")
for i in df.values:
df.replace(i<0.119, "NA")
I get the error: TypeError: '<' not supported between instances of 'str' and 'float' Can you help figure out what am I doing wrong?
I will post a pic of part of the data frame. data frame thank you!
edit: output of df.head().to_dict('list')
df = pd.DataFrame({'gene.id': ['ENSG00000013275', 'ENSG00000053900', 'ENSG00000078140', 'ENSG00000078747', 'ENSG00000087191'], 'Adrenal Gland': [1.7052697835359134, 0.5864174746159394, 1.3103934038583631, 1.1328838852957983, 1.6132835184442524], 'Artery Aorta': [1.11728713807222, 0.7422617853145246, 1.5368751812880124, 1.3472335768656902, 1.0792282365044272], 'Artery Coronary': [1.4142135623730951, nan, 1.6934906247250543, 0.8408964152537145, 1.3947436663504054], 'Artery Tibial': [1.0069555500567189, nan, 1.7411011265922482, 0.8766057213160351, 1.0643701824533598], 'Brain Cerebellum': [0.7371346086455506, nan, 1.681792830507429, 1.11728713807222, 0.8408964152537145], 'Brain Cortex': [1.3947436663504054, 0.6155722066724582, 3.1601652474535085, 1.4742692172911012, 1.5368751812880124], 'Breast': [1.4845235706290492, 0.7071067811865476, 0.9659363289248456, 0.8950250709279725, 1.4044448757379973], 'Colon Sigmoid': [1.0570180405613805, 2.1584564730088545, 2.732080513508791, 1.086734862526058, 1.0792282365044272], 'Colon Transverse': [1.0210121257071934, 1.086734862526058, 2.027918959580058, 1.0570180405613805, 0.9330329915368074], 'GE junction': [1.1328838852957983, nan, 2.3133763678105748, 1.189207115002721, 1.1328838852957983], 'Esophagus Mucosa': [1.2834258975629045, 0.9592641193252645, 2.084931521682243, 1.4142135623730951, 1.3195079107728942], 'Esophagus Muscle': [1.0792282365044272, 1.905275996087875, 2.9485384345822023, 1.248330548901612, 1.1328838852957983], 'Heart Atrial': [1.6358041171155622, 0.9862327044933592, 2.329467172936912, 1.1566881839052874, 1.6132835184442524], 'Heart Ventricle': [1.827662900458801, 2.411615655381521, 2.5668517951258085, 1.0210121257071934, 1.7654059925813097], 'Liver': [1.6021397551792442, nan, 2.3456698984637576, 1.681792830507429, 1.7532114426320702], 'Lung': [1.0792282365044272, nan, 1.11728713807222, 1.0281138266560663, 1.1250584846888094], 'Minor Salivary': [1.3103934038583631, nan, 2.445280555384137, 0.8705505632961241, 1.2397076999389869], 'Muscle Skeletal': [2.0139111001134378, 0.5625292423444047, 2.3456698984637576, 1.4539725173203106, 2.0139111001134378], 'Nerve Tibial': [1.1974787046189286, 1.0570180405613805, 0.9201876506248752, 1.5583291593209998, 1.0570180405613805], 'Ovary': [0.9330329915368074, 0.8645372313078652, 0.7845840978967508, 1.0942937012607394, 1.0281138266560663], 'Pancreas': [1.248330548901612, 1.248330548901612, 1.515716566510398, 0.757858283255199, 1.214194884395047], 'Pituitary': [1.2397076999389869, 0.946057646725596, 2.23457427614444, 0.7737824967711949, 1.624504792712471], 'Prostate': [1.0281138266560663, nan, 2.8088897514759945, 1.0717734625362931, 1.1250584846888094], 'Skin Unexpo': [1.3660402567543954, nan, 1.4142135623730951, 0.9726549474122856, 1.2834258975629045], 'Skin SunExpo': [1.4640856959456254, nan, 1.6132835184442524, 1.0792282365044272, 1.4948492486349385], 'Small Intestine': [1.1407637158684236, 0.9794202975869268, 2.6026837108838667, 0.9265880618903708, 1.1328838852957983], 'Spleen': [1.1328838852957983, 0.993092495437036, 1.3566043274476718, 1.013959479790029, 1.109569472067845], 'Stomach': [1.148698354997035, 0.6597539553864471, 2.5491212546385245, 0.8526348917679567, 1.1647335864684558], 'Testis': [1.5052467474110671, nan, 1.0352649238413776, 1.0210121257071934, 1.4640856959456254], 'Thyroid': [0.946057646725596, 0.8705505632961241, 1.6358041171155622, 0.9794202975869268, 0.9726549474122856], 'Uterus': [0.8950250709279725, nan, 1.2226402776920684, 1.1647335864684558, 1.0069555500567189], 'Vagina': [1.0424657608411214, nan, 1.7411011265922482, 1.3103934038583631, 1.1407637158684236]})
CodePudding user response:
You should be able to use mask
:
df.mask(df.lt(0.119))
It looks however like you have strings on the first column so you should probably make this one the index:
df = df.set_index('gene.id')
df.mask(df.lt(0.119))
CodePudding user response:
I'm guessing that the type of i
is string. Probably float(i)
would solve the issue.
CodePudding user response:
df.replace(float(i)<0.119, "NA")
CodePudding user response:
The error is indicating that there are strings in the dataframe. I'm not clear on why you're getting that error, since the elements of df.values
are lists, so I would expect it to return an error from you trying to use <
between lists and a float.
As regards to the first issue, your ID column contains strings. There are several things you can do to address this. One is to work with a slice of the dataframe that doesn't include that column. Another is to set that column as the index. A third is to replace your i<0.119
with something that first checks whether you have a float, and if so, check whether it's less than 0.119.
Something that should address both issues is to use df.applymap(lambda x: isinstance(x, float) and x < .119)
to create a mask (this won't catch anything that is stored as int
, however).