The dataframe
ColumnA TempB value
0 2149712 1431291 7.7
1 2149712 1421222 6.3
4 2149712 5212412 5.1
5 1421222 1451826 15.1
6 1421222 3124512 12.5
7 1421221 2141241 11.2
8 4312125 5212412 4.9
shows on ColumnA and TempB some identifiers (customers ID). I need to create a step of code that can allow users to select manually an ID (it does not matter if from ColumnA or TempB) and return all the possible connections with other identifies, if any. The results should be sorted by value (descending).
What I have tried so far is shown below:
select_id = input('Enter IDs to look for (separated by space): ')
ids_list = select_id.split()
pattern = '|'.join(r"\b{}\b".format(x) for x in ids_list)
df = df[(df['ColumnA'].str.contains(pattern)) | (df['TempB'].str.contains(pattern))]
df= df[["ColumnA", "TempB", "value"]].sort_values(by="value", ascending=False)
For instance: let's say that I select ID= 1421222. I would expect the following output:
ID Link value
1421222 1451826 15.1
1421222 3124512 12.5
1421222 2149712 6.3
or in case of multiple IDs (e.g. 1421222 and 4312125):
ID Link value
1421222 1451826 15.1
1421222 3124512 12.5
1421222 2149712 6.3
4312125 5212412 4.9
I think it is easier that what I think to generate the above outputs, but actually I am finding difficult to reproduce the logic of selection.
CodePudding user response:
subset = df[df[['ColumnA', 'TempB']].isin([1421222, 4312125]).sum(axis=1).astype(bool)]
Output:
>>> subset
ColumnA TempB value
1 2149712 1421222 6.3
5 1421222 1451826 15.1
6 1421222 3124512 12.5
8 4312125 5212412 4.9