How to get a substring from a list of strings on a condition in pandas dataframe-CodePudding

This is what I have:

df = pd.DataFrame({'Name': {0: 'Mark', 1: 'John', 2: 'Rick'},
 'Location': {0: ['Mark lives in UK',
   'Rick lives in France',
   'John Lives in US'],
  1: ['Mark lives in UK', 'Rick lives in France', 'John Lives in US'],
  2: ['Mark lives in UK', 'Rick lives in France', 'John Lives in US']}})

This is what I'd like to get:

desired_output = pd.DataFrame({'Name': ['Mark', 'John', 'Rick'],
                   'Location':[['Mark lives in UK', 'Rick lives in France', 'John Lives in US'], ['Mark lives in UK', 'Rick lives in France', 'John Lives in US'], ['Mark lives in UK', 'Rick lives in France', 'John Lives in US']],
              'Outcome': ['Mark lives in UK', 'John Lives in US', 'Rick lives in France']
            })

Here is what I tried:

df['Sorted'] = df['Location'].str.split(',')
df.apply(lambda x: [idx for idx,s in enumerate(x.sorted) if x.Name in x.sorted])

Thank you in advance!

CodePudding user response：

If you don't need the Location column from the beginning you can use this:

df = pd.DataFrame({'Name': {0: 'Mark', 1: 'John', 2: 'Rick'},
 'Location': {0: ['Mark lives in UK',
   'Rick lives in France',
   'John Lives in US'],
  1: ['Mark lives in UK', 'Rick lives in France', 'John Lives in US'],
  2: ['Mark lives in UK', 'Rick lives in France', 'John Lives in US']}})
df = df.explode('Location')
df['Person_IND'] = df['Location'].apply(lambda x : x.split(' ')[0])
df = df.loc[df['Name'] == df['Person_IND']]
df[['Name', 'Location']]

If you really need that middle column you can do this and re-name the columns

df = pd.DataFrame({'Name': {0: 'Mark', 1: 'John', 2: 'Rick'},
 'Location': {0: ['Mark lives in UK',
   'Rick lives in France',
   'John Lives in US'],
  1: ['Mark lives in UK', 'Rick lives in France', 'John Lives in US'],
  2: ['Mark lives in UK', 'Rick lives in France', 'John Lives in US']}})
df1 = df.explode('Location')
df1['Person_IND'] = df1['Location'].apply(lambda x : x.split(' ')[0])
df1 = df1.loc[df1['Name'] == df1['Person_IND']]
df1 = df1[['Name', 'Location']]
df_merge = pd.merge(df, df1, on = 'Name')
df_merge

CodePudding user response：

You can try apply on rows

df['Outcome'] = df.apply(lambda row: [loc for loc in row['Location'] if row['Name'] in loc], axis=1)

print(df)

   Name                                                    Location  \
0  Mark  [Mark lives in UK, Rick lives in France, John Lives in US]
1  John  [Mark lives in UK, Rick lives in France, John Lives in US]
2  Rick  [Mark lives in UK, Rick lives in France, John Lives in US]

                  Outcome
0      [Mark lives in UK]
1      [John Lives in US]
2  [Rick lives in France]

Or you can try explode

df['Outcome'] = (df.explode('Location')
                 .loc[lambda df: df.apply(lambda row: row['Name'] in row['Location'], axis=1), 'Location'])

print(df)

   Name                                                    Location  \
0  Mark  [Mark lives in UK, Rick lives in France, John Lives in US]
1  John  [Mark lives in UK, Rick lives in France, John Lives in US]
2  Rick  [Mark lives in UK, Rick lives in France, John Lives in US]

                Outcome
0      Mark lives in UK
1      John Lives in US
2  Rick lives in France