Sort a pandas df by trailing characters with variable whtespaces-CodePudding

I need to filter data by the end character in a list of strings. The problem I have is that the strings have a variable number of trailing whitespaces so when I use endswith('_h') to separate the groups, the names that have a couple of whitespaces after them are not captured.

How can I filter based on ends with certain characters but allow for the variable whitespace

for example, I want to sort a list of strings by the trailing characters into groupd _d, _h and dt: I have a df with the following

'''

obs_d = [  
"48973a_d   ",   
"48973a_h   ",   
"48973adt   ",   
"art_223wtb_d",
"art_223wtb_h",
"art_223wtbdt"]

values = [
1.3664,
1.02E-02,
191.84,
1.39E-04,
1.53E-02,
14.267]

d = {'obs': obs_d, 'vals': values}
df = pd.DataFrame(data=d)

hds = {}
ddns = {}
dt = {}
for r in range(len(df)):
    if df.iloc[r]['obs'].endswith('_d'):
        ddns[df.iloc[r]['obs']] = df.iloc[r]['vals']'''

Only the art_223wtb_d value is captured because it has no trailing whitespaces. How can I solve this? THanks

CodePudding user response：

for r in range(len(df)):
    if df.iloc[r]['obs'].rstrip().endswith('_d'):
        ddns[df.iloc[r]['obs']] = df.iloc[r]['vals']

point: with pandas don't use for loop any more:

df[df.obs.str.rstrip().str.endswith('_d')]

and if you want a dict:

dict(df[df.obs.str.rstrip().str.endswith('_d')].values.tolist())

and for all end chars:

df.groupby('end_obs').agg(list).apply(lambda x: dict(zip(x['obs'], x['vals'])), axis=1)

CodePudding user response：

I expect this to work in your case:

df.assign(postfix = lambda df: df.obs.str.rstrip().str[-2:]).set_index('postfix').sort_index()

Result:

                  obs        vals
postfix                          
_d        48973a_d       1.366400
_d       art_223wtb_d    0.000139
_h        48973a_h       0.010200
_h       art_223wtb_h    0.015300
dt        48973adt     191.840000
dt       art_223wtbdt   14.267000