How to replace a list with first element of list in pandas dataframe column?-CodePudding

I have a pandas dataframe df, which look like this:

df = pd.DataFrame({'Name':['Harry', 'Sam', 'Raj', 'Jamie', 'Rupert'],
                   'Country':['USA', "['USA', 'UK', 'India']", "['India', 'USA']", 'Russia', 'China']})

Name           Country

Harry          USA
Sam            ['USA', 'UK', 'India']
Raj            ['India', 'USA']
Jamie          Russia
Rupert         China

Some values in Country column are list, and I want to replace those list with the first element in the list, so that it will look like this:

Name           Country

Harry          USA
Sam            USA
Raj            India
Jamie          Russia
Rupert         China

CodePudding user response：

As you have strings, you could use a regex here:

df['Country'] = df['Country'].str.extract('((?<=\[["\'])[^"\']*|^[^"\'] $)')

output (as a new column for clarity):

     Name                 Country Country2
0   Harry                     USA      USA
1     Sam  ['USA', 'UK', 'India']      USA
2     Raj        ['India', 'USA']    India
3   Jamie                  Russia   Russia
4  Rupert                   China    China

regex:

(             # start capturing
(?<=\[["\'])  # if preceded by [" or ['
[^"\']*       # get all text until " or '
|             # OR
^[^"\'] $     # get whole string if it doesn't contain " or '
)             # stop capturing

CodePudding user response：

Try something like:

import ast
def changeStringList(value):
  try:
    myList = ast.literal_eval(value)
    return myList[0]
  except:
    return value
df["Country"] = df["Country"].apply(changeStringList)
df

Output

	Name	Country
0	Harry	USA
1	Sam	USA
2	Raj	India
3	Jamie	Russia
4	Rupert	China

Note that, by using the changeStringList function, we try to reform the string list to an interpretable list of strings and return the first value. If it is not a list, then it returns the value itself.

CodePudding user response：

Try this:

import ast
df['Country'] = df['Country'].where(df['Country'].str.contains('[', regex=False), '[\''   df['Country']   '\']').apply(ast.literal_eval).str[0]

Output:

>>> df
     Name Country
0   Harry     USA
1     Sam     USA
2     Raj   India
3   Jamie  Russia
4  Rupert   China

CodePudding user response：

A regex solution.

import re

tempArr = []
for val in df["Country"]:
    if val.startswith("["): 
        val = re.findall(r"[A-Za-z] ",val)[0]
        tempArr.append(val)
    else: tempArr.append(val)

df["Country"] = tempArr

df

     Name Country
0   Harry     USA
1     Sam     USA
2     Raj   India
3   Jamie  Russia
4  Rupert   China

CodePudding user response：

If you have string you could use Series.str.strip in order to remove ']' or '[' and then use Series.str.split to convert all rows to list ,after that we could use .str accesor

df['Country'] = df['Country'].str.strip('[|]').str.split(',')\
                             .str[0].str.replace("'", "")


     Name Country
0   Harry     USA
1     Sam     USA
2     Raj   India
3   Jamie  Russia
4  Rupert   China