I have a Series of lists and would like to index the first element of each list in a data frame of lists using pandas. How can I do this?
Working example
My original dataset is a pandas data frame that looks like:
# Import raw dataset from URL
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
column_names = ['MPG', 'Cylinders', 'Displacement', 'Horsepower',
'Weight', 'Acceleration', 'Model Year', 'Origin', 'Carname']
train = pd.read_csv(url, names=column_names,
na_values='?',sep='\s '
, skipinitialspace=True)
temp1 = pd.DataFrame(train["Carname"].str.split())
print(temp1)
Carname
0 [plymouth, champ]
1 [amc, matador]
2 [chevroelt, chevelle, malibu]
... ...
1489 [vw, dasher, (diesel)]
1490 [honda, accord]
1491 [ford, escort, 4w]
The desired result for this would be something like,
'plymouth'
'amc'
'chevroelt'
.....
CodePudding user response:
You can use the string accessor .str[]
, as follows:
temp1['Carname'].str[0] # str[0] for first element in list
Result:
0 chevrolet
1 buick
2 plymouth
3 amc
4 ford
...
393 ford
394 vw
395 dodge
396 ford
397 chevy
Name: Carname, Length: 398, dtype: object
CodePudding user response:
You could apply a function to the series that selects the first element of each series item, resulting in a new series:
s_first_elements = s.apply(lambda x: x[0])