Home > Enterprise >  Dataframe tuple not being created as expected
Dataframe tuple not being created as expected

Time:03-24

Using the below code to create a dataframe :

import pandas as pd


col1 = [[1,2,3]]
col2 = [['a1','a2','a3']]
col3 = [['b1','b2','b3']]
col4 = ['id1']

df_test = pd.DataFrame(
            {
             'col1': col1,
             'col2': col2,
             'col3': col3,
            'col4': col4
            })
df_test = df_test.set_index('col4')

creates :

    col1    col2    col3
col4            
id1 [1, 2, 3]   [a1, a2, a3]    [b1, b2, b3]

I'm attempting to create a dataframe of format

col4
id1    ([1, a1, b1], [2, a2, b2], [3, a3, b3])
dtype: object

Each corresponding index of the list of lists c1,c2,c3 contains an element in the tuple.

Using :

df_test = df_test.apply(tuple, axis=1)
df_test

to convert the dataframe renders :

df_test = df_test.apply(tuple, axis=1)
df_test

How to create a dataframe of format ? :

col4
id1    ([1, a1, b1], [2, a2, b2], [3, a3, b3])
dtype: object

using either

1.the original source data :

col1 = [[1,2,3]]
col2 = [['a1','a2','a3']]
col3 = [['b1','b2','b3']]
col4 = ['id1']

or

  1. creating the tuple format after creating the dataframe :

    df_test = pd.DataFrame( { 'col1': col1, 'col2': col2, 'col3': col3, 'col4': col4 }) df_test = df_test.set_index('col4')

Using :

pd.DataFrame([tuple(zip(*np.array([df_test['col1'],df_test['col2'], df_test['col3']]).T.tolist()))], index= df_test.index)

renders :

0   1   2
col4            
id1 ([1, 2, 3],)    ([a1, a2, a3],) ([b1, b2, b3],)

I've tried wrapping the elements in a list :

pd.DataFrame([tuple(zip(*np.array([list(df_test['col1']),list(df_test['col2']), list(df_test['col3'])]).T.tolist()))], index= df_test.index)

but same result

CodePudding user response:

Update:

After dataframe creation, use:

>>> df_test.apply(lambda x: tuple(list(i) for i in zip(*x)), axis=1)
id1    ([1, a1, b1], [2, a2, b2], [3, a3, b3])
dtype: object

You can try:

>>> pd.DataFrame([tuple(zip(*np.array([col1, col2, col3]).T.tolist()))], index=col4)

                                           0
id1  ([1, a1, b1], [2, a2, b2], [3, a3, b3])
  • Related