Home > Software engineering >  Find all rows whose column name contains a specific string
Find all rows whose column name contains a specific string

Time:09-28

I have a dataframe as shown below.It has 3 columns with names "TTN_163_2.5_-40 ","TTN_163_2.7_-40" and " TTN_163_3.6_-40".

I need to select all rows whose column name contains '2.5','3.6','2.7'.

I have some column names which contains 1.6,1.62 and 1.656.I need to select these separately.when I am writing df_psrr_funct_1V6.filter(regex='1\.6|^xvalues$') I am geting all rows corresponds to 1.6 ,1.65 and 1.62 .I don't want this .May I know how to select uniquely.

I used this method (df_psrr_funct = df_psrr_funct.filter(regex='2.5'))but it is not capturing 1st column(xvalues)

Sample dataframe

xvalues TTN_163_2.5_-40     TTN_163_2.7_-40   TTN_163_3.6_-40   
23.0279  -58.7591            -58.5892           -60.0966    
30.5284  -58.6903             -57.3153          -59.9111    

Please the image my dataframe

enter image description here May I know how to do this

CodePudding user response:

Expand regex with | for or, ^ is for start string, $ is for end string for extract column name xvalues and avoid extract colums names with substrings like xvalues 1 or aaa xvalues:

df_psrr_funct = df_psrr_funct.filter(regex='2\.5|^xvalues$')
print (df_psrr_funct)
   xvalues  TTN_163_2.5_-40
0  23.0279         -58.7591
1  30.5284         -58.6903

EDIT: If need values between _ use:

print (df_psrr_funct)
   xvalues  TTN_163_1.6_-40  TTN_163_1.62_-40  TTN_163_1.656_-40
0  23.0279         -58.7591          -58.5892           -60.0966
1  30.5284         -58.6903          -57.3153           -59.9111

df_psrr_funct = df_psrr_funct.filter(regex='_1\.6_|^xvalues$')
print (df_psrr_funct)
   xvalues  TTN_163_1.6_-40
0  23.0279         -58.7591
1  30.5284         -58.6903

CodePudding user response:

Another approach:

df_psrr_funct.filter(regex = '^\D $|2.5')

   xvalues  TTN_163_2.5_-40
0  23.0279  -58.7591
1  30.5284  -58.6903

CodePudding user response:

using regex for this doesnt make any sense... just do

columns_with_2point5 = [c for c in df.columns if "2.5" in c]
only_cool_cols = df[['xvalues']   columns_with_2point5]

dont overcomplicate it ...

if you dont need the first column you can just use filter with like instead of using one of the regex solutions (see first comment from @BeRT2me)

  • Related