I have a pandas dataframe df
, which look like this:
df = pd.DataFrame({'Name':['Harry', 'Sam', 'Raj', 'Jamie', 'Rupert'],
'Country':['USA', "['USA', 'UK', 'India']", "['India', 'USA']", 'Russia', 'China']})
Name Country
Harry USA
Sam ['USA', 'UK', 'India']
Raj ['India', 'USA']
Jamie Russia
Rupert China
Some values in Country
column are list, and I want to replace those list with the first element in the list, so that it will look like this:
Name Country
Harry USA
Sam USA
Raj India
Jamie Russia
Rupert China
CodePudding user response:
As you have strings, you could use a regex here:
df['Country'] = df['Country'].str.extract('((?<=\[["\'])[^"\']*|^[^"\'] $)')
output (as a new column for clarity):
Name Country Country2
0 Harry USA USA
1 Sam ['USA', 'UK', 'India'] USA
2 Raj ['India', 'USA'] India
3 Jamie Russia Russia
4 Rupert China China
regex:
( # start capturing
(?<=\[["\']) # if preceded by [" or ['
[^"\']* # get all text until " or '
| # OR
^[^"\'] $ # get whole string if it doesn't contain " or '
) # stop capturing
CodePudding user response:
Try something like:
import ast
def changeStringList(value):
try:
myList = ast.literal_eval(value)
return myList[0]
except:
return value
df["Country"] = df["Country"].apply(changeStringList)
df
Output
Name | Country | |
---|---|---|
0 | Harry | USA |
1 | Sam | USA |
2 | Raj | India |
3 | Jamie | Russia |
4 | Rupert | China |
Note that, by using the changeStringList
function, we try to reform the string list to an interpretable list of strings and return the first value. If it is not a list, then it returns the value itself.
CodePudding user response:
Try this:
import ast
df['Country'] = df['Country'].where(df['Country'].str.contains('[', regex=False), '[\'' df['Country'] '\']').apply(ast.literal_eval).str[0]
Output:
>>> df
Name Country
0 Harry USA
1 Sam USA
2 Raj India
3 Jamie Russia
4 Rupert China
CodePudding user response:
A regex solution.
import re
tempArr = []
for val in df["Country"]:
if val.startswith("["):
val = re.findall(r"[A-Za-z] ",val)[0]
tempArr.append(val)
else: tempArr.append(val)
df["Country"] = tempArr
df
Name Country
0 Harry USA
1 Sam USA
2 Raj India
3 Jamie Russia
4 Rupert China
CodePudding user response:
If you have string you could use Series.str.strip
in order to remove ']'
or '['
and then use Series.str.split
to convert all rows to list ,after that we could use .str
accesor
df['Country'] = df['Country'].str.strip('[|]').str.split(',')\
.str[0].str.replace("'", "")
Name Country
0 Harry USA
1 Sam USA
2 Raj India
3 Jamie Russia
4 Rupert China