This is what I have:
df = pd.DataFrame({'Name': {0: 'Mark', 1: 'John', 2: 'Rick'},
'Location': {0: ['Mark lives in UK',
'Rick lives in France',
'John Lives in US'],
1: ['Mark lives in UK', 'Rick lives in France', 'John Lives in US'],
2: ['Mark lives in UK', 'Rick lives in France', 'John Lives in US']}})
This is what I'd like to get:
desired_output = pd.DataFrame({'Name': ['Mark', 'John', 'Rick'],
'Location':[['Mark lives in UK', 'Rick lives in France', 'John Lives in US'], ['Mark lives in UK', 'Rick lives in France', 'John Lives in US'], ['Mark lives in UK', 'Rick lives in France', 'John Lives in US']],
'Outcome': ['Mark lives in UK', 'John Lives in US', 'Rick lives in France']
})
Here is what I tried:
df['Sorted'] = df['Location'].str.split(',')
df.apply(lambda x: [idx for idx,s in enumerate(x.sorted) if x.Name in x.sorted])
Thank you in advance!
CodePudding user response:
If you don't need the Location column from the beginning you can use this:
df = pd.DataFrame({'Name': {0: 'Mark', 1: 'John', 2: 'Rick'},
'Location': {0: ['Mark lives in UK',
'Rick lives in France',
'John Lives in US'],
1: ['Mark lives in UK', 'Rick lives in France', 'John Lives in US'],
2: ['Mark lives in UK', 'Rick lives in France', 'John Lives in US']}})
df = df.explode('Location')
df['Person_IND'] = df['Location'].apply(lambda x : x.split(' ')[0])
df = df.loc[df['Name'] == df['Person_IND']]
df[['Name', 'Location']]
If you really need that middle column you can do this and re-name the columns
df = pd.DataFrame({'Name': {0: 'Mark', 1: 'John', 2: 'Rick'},
'Location': {0: ['Mark lives in UK',
'Rick lives in France',
'John Lives in US'],
1: ['Mark lives in UK', 'Rick lives in France', 'John Lives in US'],
2: ['Mark lives in UK', 'Rick lives in France', 'John Lives in US']}})
df1 = df.explode('Location')
df1['Person_IND'] = df1['Location'].apply(lambda x : x.split(' ')[0])
df1 = df1.loc[df1['Name'] == df1['Person_IND']]
df1 = df1[['Name', 'Location']]
df_merge = pd.merge(df, df1, on = 'Name')
df_merge
CodePudding user response:
You can try apply
on rows
df['Outcome'] = df.apply(lambda row: [loc for loc in row['Location'] if row['Name'] in loc], axis=1)
print(df)
Name Location \
0 Mark [Mark lives in UK, Rick lives in France, John Lives in US]
1 John [Mark lives in UK, Rick lives in France, John Lives in US]
2 Rick [Mark lives in UK, Rick lives in France, John Lives in US]
Outcome
0 [Mark lives in UK]
1 [John Lives in US]
2 [Rick lives in France]
Or you can try explode
df['Outcome'] = (df.explode('Location')
.loc[lambda df: df.apply(lambda row: row['Name'] in row['Location'], axis=1), 'Location'])
print(df)
Name Location \
0 Mark [Mark lives in UK, Rick lives in France, John Lives in US]
1 John [Mark lives in UK, Rick lives in France, John Lives in US]
2 Rick [Mark lives in UK, Rick lives in France, John Lives in US]
Outcome
0 Mark lives in UK
1 John Lives in US
2 Rick lives in France