I have two dataframes df
and tf
like as given below
df = [{"unique_key": 1, "test_ids": "1.0,15,2.0,nan"}, {"unique_key": 2, "test_ids": "51,75.0,11.0,NaN"},{"unique_key": 3, "test_ids":np.nan},
{"unique_key": 4, "test_ids":np.nan}]
df = pd.DataFrame(df)
test_ids,status,revenue,cnt_days
1,passed,234.54,3
2,passed,543.21,5
11,failed,21.3,4
15,failed,2098.21,6
51,passed,232,21
75,failed,123.87,32
tf = pd.read_clipboard(sep=',')
I would like to link the unique_key
column from df
to the tf
dataframe
For ex: I will show my output below (that's easy to understand than text)
I was trying something like below
for b in df.test_ids.tolist():
for a in b.split(','):
if a >= 0: # to exclude NA values from checking
for i in len(test_ids):
if int(a) == tf['test_ids'][i]:
tf['unique_key'] = df['unique_key']
But this is neither efficient nor elegant to solve my problem.
Is there any other better way to achieve the expected output shown below?
CodePudding user response:
You can create Series
with remove duplicates and missing values, swap to dictioanry and for new first column use DataFrame.insert
with Series.map
:
s = (df.set_index('unique_key')['test_ids']
.str.split(',')
.explode()
.astype(float)
.dropna()
.astype(int)
.drop_duplicates()
d = {v: k for k, v in s.items()}
print (d)
{1: 1, 15: 1, 2: 1, 51: 2, 75: 2, 11: 2}
tf.insert(0, 'unique_key', tf['test_ids'].map(d))
print (tf)
unique_key test_ids status revenue cnt_days
0 1 1 passed 234.54 3
1 1 2 passed 543.21 5
2 2 11 failed 21.30 4
3 1 15 failed 2098.21 6
4 2 51 passed 232.00 21
5 2 75 failed 123.87 32
Another idea is working with DataFrame
and create Series
for mapping:
s = (df.assign(new = df['test_ids'].str.split(','))
.explode('new')
.astype({'new':float})
.dropna(subset=['new'])
.astype({'new':int})
.drop_duplicates(subset=['new'])
.set_index('new')['unique_key'])
print (s)
new
1 1
15 1
2 1
51 2
75 2
11 2
Name: unique_key, dtype: int64
tf.insert(0, 'unique_key', tf['test_ids'].map(s))
print (tf)
unique_key test_ids status revenue cnt_days
0 1 1 passed 234.54 3
1 1 2 passed 543.21 5
2 2 11 failed 21.30 4
3 1 15 failed 2098.21 6
4 2 51 passed 232.00 21
5 2 75 failed 123.87 32