Suppose I have a simple dataframe where I have four features as food, kitchen, city, and detail.
d = {'Food': ['P1|0', 'P2', 'P3|45', 'P1', 'P2', 'P4', 'P1|1', 'P3|7', 'P5', 'P1||23'],
'Kitchen' : ['L1', 'L2','L9', 'L4','L5', 'L6','L1', 'L9','L10', 'L1'],
'City': ['A', 'A', 'A', 'B', 'B','B', 'C', 'C', 'C','D'],
'Detail': ['d1', 'd2', 'd3', 'd4', 'd5', 'd6', 'd7', 'd8', 'd9','d0']}
df = pd.DataFrame(data=d)
My goal is to use the substring of Food value without |
and create a new dataframe where I can see which kitchens do produce similar foods. The way I define similarity is that substring should match with respect to Kitchen.
df['Food'] = df['Food'].apply(str)
df.insert(0,'subFood',df['Food'].str.split('|').str[0])
df.iloc[: , :2]
subFood Food
0 P1 P1|0
1 P2 P2
2 P3 P3|45
3 P1 P1
4 P2 P2
5 P4 P4
6 P1 P1|1
7 P3 P3|7
8 P5 P5
9 P1 P1||23
To do so, I use merge
function together with query
.
df.merge(df, on=['subFood', 'Kitchen'], suffixes=('_1', '_2')).query('City_1 != City_2')
subFood Food_1 Kitchen City_1 Detail_1 Food_2 City_2 Detail_2
1 P1 P1|0 L1 A d1 P1|1 C d7
2 P1 P1|0 L1 A d1 P1||23 D d0
3 P1 P1|1 L1 C d7 P1|0 A d1
5 P1 P1|1 L1 C d7 P1||23 D d0
6 P1 P1||23 L1 D d0 P1|0 A d1
7 P1 P1||23 L1 D d0 P1|1 C d7
11 P3 P3|45 L9 A d3 P3|7 C d8
12 P3 P3|7 L9 C d8 P3|45 A d3
I got stuck here. My intention is to have a dataframe that should look similar to the dataframe shown below. I appreciate any help and / or hint.
subFood Food_1 Food_2 Kitchen City Detail
P1 P1|0 P1|0 L1 A d1
P1 P1|0 P1|1 L1 C d1
....
CodePudding user response:
IIUC, you can split each row into two rows by combining the city names to a list and then using explode
:
merged = df.merge(df, on=["subFood","Kitchen"], suffixes=("_1","_2")).query("City_1 != City_2")
merged["City"] = merged[["City_1","City_2"]].to_numpy().tolist()
output = merged.drop(["City_1","City_2","Detail_2"],axis=1).explode("City").rename(columns={"Detail_1":"Detail"})
>>> output
subFood Food_1 Kitchen Detail Food_2 City
1 P1 P1|0 L1 d1 P1|1 A
1 P1 P1|0 L1 d1 P1|1 C
2 P1 P1|0 L1 d1 P1||23 A
2 P1 P1|0 L1 d1 P1||23 D
3 P1 P1|1 L1 d7 P1|0 C
3 P1 P1|1 L1 d7 P1|0 A
5 P1 P1|1 L1 d7 P1||23 C
5 P1 P1|1 L1 d7 P1||23 D
6 P1 P1||23 L1 d0 P1|0 D
6 P1 P1||23 L1 d0 P1|0 A
7 P1 P1||23 L1 d0 P1|1 D
7 P1 P1||23 L1 d0 P1|1 C
11 P3 P3|45 L9 d3 P3|7 A
11 P3 P3|45 L9 d3 P3|7 C
12 P3 P3|7 L9 d8 P3|45 C
12 P3 P3|7 L9 d8 P3|45 A