I have a dataframe with the top 12 predictions my kNN has made for each ID, it looks like this:
customer_id | prediction |
---|---|
00000dbacae5abe5e2 | [677530001, 677515001, 677511001, 677506003, 677501001, 677490001, 677478006, 677478003, 677478002, 677546006, 949551001, 903049003] |
0000423b00ade9141 | [677511001, 677506003, 677501001, 677490001, 677478006, 677478003, 677478002, 677386001, 677385001, 677760003, 949551001, 826674001] |
Is it possible to remove the square brackets from each line (they are arrays) in the dataframe and also add a prefix of zero before each prediction, like this:
customer_id | prediction |
---|---|
00000dbacae5abe5e2 | 0677530001, 0677515001, 0677511001..... |
0000423b00ade9141 | 0677511001, 0677506003, 0677501001..... |
My code in generating these predictions and tables:
n = 12
probas = kNN.predict_proba(X.head())
top_n_idx = np.argsort(probas, axis=1)[:,-n:]
top_n = [kNN.classes_[i] for i in top_n_idx]
results = list(zip(top_n))
results = pd.DataFrame(results)
ids_test.reset_index(drop=True, inplace=True)
results.reset_index(drop=True, inplace=True)
y_test.reset_index(drop=True, inplace=True)
knn_table = pd.concat([ids, results], axis=1, ignore_index=True)
knn_table = knn_table.rename(columns={0: 'customer_id', 1: 'prediction'})
CodePudding user response:
Try:
df["prediction"] = ("0" df["prediction"].explode().astype(str)).groupby(level=0).agg(", ".join)
Alternatively with apply
:
df["prediction"] = df["prediction"].apply(lambda x: "0" ", 0".join(map(str,x)))
Output:
>>> df
customer_id prediction
0 00000dbacae5abe5e2 0677530001, 0677515001, 0677511001, 0677506003...
1 0000423b00ade9141 0677511001, 0677506003, 0677501001, 0677490001...