I have this simplified DataFrame:
A | B |
---|---|
foo | A, B, C, D |
I create a dictionary d:
d = {'foo': 'A, B, C'}
Dictionary keys are in column A and their values are in column B. How can I remove any substrings that aren't part of my dictionary key value?
Desired DataFrame:
A | B |
---|---|
foo | A, B, C |
CodePudding user response:
If need compare by spiltted values by ,
use:
d = {'foo': 'A, B, C'}
f = lambda x: ', '.join(y for y in x.B.split(', ') if y in x.A.split(', '))
df['B'] = df.assign(A = df['A'].map(d)).apply(f, axis=1)
print (df)
A B
0 foo A, B, C
CodePudding user response:
I can misunderstood but if you want to remove:
any substrings that aren't part of my dictionary key value?
It probably means you want to only keep the values in your dictionary?.
Suppose the dataframe below:
>>> df
A B
0 foo A, B, C, D
1 bar X, Y, Z
Update your values from your dict:
df.update(pd.DataFrame(d.items(), columns=df.columns))
Output result:
>>> df
A B
0 foo A, B, C
1 bar X, Y, Z