I need to write a function.
It takes any value from the dataset as input and should look for an intersection in all rows.
For example: phone = 87778885566
The table is represented by the following fields:
- key
- id
- phone
Test data:
- 1; 12345; 89997776655; [email protected]
- 2; 54321; 87778885566; [email protected]
- 3; 98765; 87776664577; [email protected]
- 4; 66678; 87778885566; [email protected]
- 5; 34567; 84547895566; [email protected]
- 6; 34567; 89087545678; [email protected]
The output should be:
- 2; 54321; 87778885566; [email protected]
- 4; 66678; 87778885566; [email protected]
- 5; 34567; 84547895566; [email protected]
- 6; 34567; 89087545678; [email protected]
It should check all values and if values intersect somewhere, return a dataset with intersections.
CodePudding user response:
I didn't get the idea, how do you want to get intersections by phone number in key
or email
rows from your description, so I created two functions:
import pandas as pd
#you have to pass your dataframe and keywords in list you want to intersect
def get_intersections(data: pd.DataFrame, kw: list):
values = data.to_numpy()
intersected_data = []
for i in values:
if set(kw).intersection(i):
intersected_data.append(tuple(i))
return pd.DataFrame(set(intersected_data), columns=data.columns)
df >>
key id phone email
0 1 12345 89997776655 [email protected]
1 2 54321 87778885566 [email protected]
2 3 98765 87776664577 [email protected]
3 4 66678 87778885566 [email protected]
4 5 34567 84547895566 [email protected]
5 6 34567 89087545678 [email protected]
get_intersections(df,['87778885566','[email protected]']).sort_values(by='key').reset_index(drop=True)
>>
key id phone email
0 2 54321 87778885566 [email protected]
1 4 66678 87778885566 [email protected]
2 5 34567 84547895566 [email protected]
Another function search intersections row by row in your dataframe:
def get_intersections(data):
values = data.to_numpy()
intersected_data = []
for i in values:
for j in values:
if set(i) != set(j) and set(i).intersection(j):
intersected_data.append(tuple(i))
return pd.DataFrame(set(intersected_data), columns=data.columns)
get_intersections(df).sort_values(by='key').reset_index(drop=True)
>>
key id phone email
0 2 54321 87778885566 [email protected]
1 4 66678 87778885566 [email protected]
2 5 34567 84547895566 [email protected]
3 6 34567 89087545678 [email protected]
CodePudding user response:
You could use recurssion
:
import numpy as np
def relation(dat, values):
d = dat.apply(lambda x: x.isin(values.ravel()))
values1 = dat.iloc[np.unique(np.where(d)[0]),:]
if set(np.array(values)) == set(values1.to_numpy().ravel()):
return values1
else:
return relation(dat, values1.to_numpy().ravel())
relation(df.astype(str), np.array(['87778885566']))
1 2 3
1 54321 87778885566 [email protected]
3 66678 87778885566 [email protected]
4 34567 84547895566 [email protected]
5 34567 89087545678 [email protected]