I need to filter data by the end character in a list of strings. The problem I have is that the strings have a variable number of trailing whitespaces so when I use endswith('_h') to separate the groups, the names that have a couple of whitespaces after them are not captured.
How can I filter based on ends with certain characters but allow for the variable whitespace
for example, I want to sort a list of strings by the trailing characters into groupd _d, _h and dt: I have a df with the following
'''
obs_d = [
"48973a_d ",
"48973a_h ",
"48973adt ",
"art_223wtb_d",
"art_223wtb_h",
"art_223wtbdt"]
values = [
1.3664,
1.02E-02,
191.84,
1.39E-04,
1.53E-02,
14.267]
d = {'obs': obs_d, 'vals': values}
df = pd.DataFrame(data=d)
hds = {}
ddns = {}
dt = {}
for r in range(len(df)):
if df.iloc[r]['obs'].endswith('_d'):
ddns[df.iloc[r]['obs']] = df.iloc[r]['vals']'''
Only the art_223wtb_d value is captured because it has no trailing whitespaces. How can I solve this? THanks
CodePudding user response:
for r in range(len(df)):
if df.iloc[r]['obs'].rstrip().endswith('_d'):
ddns[df.iloc[r]['obs']] = df.iloc[r]['vals']
point: with pandas
don't use for
loop any more:
df[df.obs.str.rstrip().str.endswith('_d')]
and if you want a dict
:
dict(df[df.obs.str.rstrip().str.endswith('_d')].values.tolist())
and for all end char
s:
df.groupby('end_obs').agg(list).apply(lambda x: dict(zip(x['obs'], x['vals'])), axis=1)
CodePudding user response:
I expect this to work in your case:
df.assign(postfix = lambda df: df.obs.str.rstrip().str[-2:]).set_index('postfix').sort_index()
Result:
obs vals
postfix
_d 48973a_d 1.366400
_d art_223wtb_d 0.000139
_h 48973a_h 0.010200
_h art_223wtb_h 0.015300
dt 48973adt 191.840000
dt art_223wtbdt 14.267000