I am working on a project where my dataset looks like bellow:
Origin | Destination | distance |
---|---|---|
Hamburg | Frankfurt | 393.34 |
Hamburg | Cologne | 357.15 |
Cologne | Hamburg | 358.24 |
Frankfurt | Hamburg | 400.44 |
I am interested only on one way either "Hamburg - Frankfurt" or "Frankfurt - Hamburg". How can i filter this in pandas so that i have one of them in dataset?
Thanks :)
CodePudding user response:
Arpit Tiwari (Data Scientist)
import pandas as pd
dict1 = {'Origin':['Hamburg', 'Hamburg', 'Cologne', 'Frankfurt'],
'Destination':['Frankfurt', 'Cologne', 'Hamburg', "Hamburg"],
'distance':[393.34, 357.15, 358.24, 400.44]
}
X = pd.DataFrame(dict1)
## Please refer to this part
X['Key'] = (X['Origin'] X['Destination']).apply(lambda x:"".join(sorted(x)))
X=X.drop_duplicates(subset=['Key'],keep='first').drop('Key',axis=1)
CodePudding user response:
IIUC, you can try
cols = ['Origin', 'Destination']
vals = ['Hamburg', 'Frankfurt']
out = df[(df[cols] == vals).all(axis=1) | (df[cols] == vals[::-1]).all(axis=1)]
print(out)
Origin Destination distance
0 Hamburg Frankfurt 393.34
3 Frankfurt Hamburg 400.44