Home > Enterprise >  How do I extract elements from a list in a pandas dataframe column?
How do I extract elements from a list in a pandas dataframe column?

Time:10-12

I have the following lists:

dates = ['12/29/2020', '12/25/2020', '12/22/2020']
numbers = [ [1, 31, 35], [17, 23, 36], [29, 53, 56] ]

I used them to make a DataFrame:

df = pd.DataFrame(
    {
        'date':dates,
        'nums': numbers
    }
)

This gives me a DataFrame with two columns. I want to break out the elements in the list to create 3 columns (one for each number in the list) to end up with the following DataFrame:

     date            num1 num2 num3 
0    '12/29/2020'    1    31   35
1    '12/25/2020'    17   23   36
2    '12/22/2020'    29   53   56

How can I do this?

CodePudding user response:

Create a new data frame from nums column by converting it to list first, and then concat with date column:

pd.concat([df.date, pd.DataFrame(df.nums.to_list()).add_prefix('num')], axis=1)

         date  num0  num1  num2
0  12/29/2020     1    31    35
1  12/25/2020    17    23    36
2  12/22/2020    29    53    56

CodePudding user response:

Create a new dataframe and join it back:

>>> df[['date']].join(pd.DataFrame(df['num'].tolist()).rename(lambda x: f'num{x   1}', axis=1))
         date  num1  num2  num3
0  12/29/2020     1    31    35
1  12/25/2020    17    23    36
2  12/22/2020    29    53    56
>>> 

Or just add_prefix:

>>> df[['date']].join(pd.DataFrame(df['num'].tolist()).add_prefix('num'))
         date  num0  num1  num2
0  12/29/2020     1    31    35
1  12/25/2020    17    23    36
2  12/22/2020    29    53    56
>>> 

CodePudding user response:

So the other answers sufficiently cover the case where you need to fix an already existing dataframe, but just in case you have the opportunity, it's much easier to simply fix your data before creating a dataframe:

In [1]: import pandas as pd

In [2]: dates = ['12/29/2020', '12/25/2020', '12/22/2020']

In [3]: numbers = [[1, 31, 35], [17, 23, 36], [29, 53, 56]]

In [4]: nums = {f"num{i}": n for i, n in enumerate(zip(*numbers), 1)}

In [5]: df = pd.DataFrame({"dates": dates, **nums})

In [6]: df
Out[6]:
        dates  num1  num2  num3
0  12/29/2020     1    31    35
1  12/25/2020    17    23    36
2  12/22/2020    29    53    56

Or, another way:

In [7]: data = [[date, *nums] for date, nums in zip(dates, numbers)]

In [8]: pd.DataFrame(data, columns=["dates", "num1", "num2", "num3"])
Out[8]:
        dates  num1  num2  num3
0  12/29/2020     1    31    35
1  12/25/2020    17    23    36
2  12/22/2020    29    53    56

CodePudding user response:

You can use a dataframe constructor like this:

pd.DataFrame(numbers, 
             index=dates, 
             columns=[f'num{i 1}' for i in range(len(numbers))])\
  .rename_axis('dates').reset_index()

Output:

        dates  num1  num2  num3
0  12/29/2020     1    31    35
1  12/25/2020    17    23    36
2  12/22/2020    29    53    56
  • Related