My dataframe looks like this. I am trying to create a list of Names. For ex: ["Mike", "Jean"]
:
0. Mike, Jean
1. May, Weather
2. Jack, 100
What I've tried:
df["NAME"] = df["NAME"].str.split(",")
for i in range(len(df["NAME"])):
df["NAME"][i] = df["NAME"][i] .split(",")
OUTPUT
0. [Mike, Jean]
1. [May, Weather]
2. [Jack, 100]
OUTPUT I WANT
0. ["Mike", "Jean"]
1. ["May", "Weather"]
2. ["Jack", "100"]
I am new to Python and Pandas.
CodePudding user response:
Assuming this input:
df = pd.DataFrame({'Name': ['Mike, Jean', 'May, Weather', 'Jack, 100']})
When you run:
df['Name'].str.split(', ')
and get:
0 [Mike, Jean]
1 [May, Weather]
2 [Jack, 100]
Name: Name, dtype: object
The [Mike, Jean]
format is just a representation.
The real data is indeed a Series of lists, as show by an explicit conversion of the Series to list:
df['Name'].str.split(', ').to_list()
output:
[['Mike', 'Jean'],
['May', 'Weather'],
['Jack', '100']]
CodePudding user response:
You don't really need to use a for-loop, you can do the split with:
df['Name'] = df['Name'].str.split()
This will return a pandas series containing a list per row, such as:
0 ["Mike", "Jean"]
1 ["May", "Weather"]
2 ["Jack", "100"]
If you wish to extract the Series' values as list itself then you can use:
name_lists = df['Name'].str.split().values.tolist()
Returning:
[["Mike","Jean"],["May","Weather"],["Jack","100"]]
CodePudding user response:
You can iterate over the values of your dataframe and easily convert the rows of the resulting Numpy array into lists (cf example code below)
import pandas as pd
df = pd.DataFrame()
df['Name'] = ['Name1', 'Name2']
df['FirstName'] = ['FirstName1', 'FirstName2']
L = []
for row in df.values:
L.append(list(row))
print(L)
Cheers
CodePudding user response:
Code snippet below should solve your purpose :)
import pandas as pd
df = pd.DataFrame(["[Mike, Jean]" , "[May, Weather]", "[Jack, 100]"], columns=['name'])
df.head()
name
0 [Mike, Jean]
1 [May, Weather]
2 [Jack, 100]
df['type_name'] = df.apply(lambda y: type(y['name']), axis=1)
df['name1'] = df.apply(lambda y: y['name'].replace('[', '').replace(']', '').split(", "), axis=1)
df['type_name1'] = df.apply(lambda y: type(y['name1']), axis=1)
df.head()
name type_name name1 type_name1
0 [Mike, Jean] <class 'str'> [Mike, Jean] <class 'list'>
1 [May, Weather] <class 'str'> [May, Weather] <class 'list'>
2 [Jack, 100] <class 'str'> [Jack, 100] <class 'list'>
final_list = df['name1'].values.tolist()
print(final_list)
[['Mike', 'Jean'], ['May', 'Weather'], ['Jack', '100']]