Home > Enterprise >  Unlisting lists in a Dataframe column
Unlisting lists in a Dataframe column

Time:07-02

I have a column of values split in two lists

coordinates
----
[[36.2046069345455, 23.466756], [56.678766, 45.1405656576776]
[[46.2034534576765, 56.877879], [34.207049, 18.1565655652422]]
[[41.3223449567164, 34.645445], [78.206545, 66.1402362184811]]
[[23.2046069887887, 87.234223], [76.212123, 15.3943493949348]]
[[33.9685958954948, 78.454555], [32.765666, 23.4685489900090]]
[[12.7665776555654, 45.987878], [43.787786, 45.3494893404820]]

I want to divide it in four different columns. I tried with

df['coordinates'] = df['coordinates'].apply(lambda x: ' '.join(dict.fromkeys(x).keys()))

But it returns

TypeError: unhashable type: 'list'

Any idea on how to solve it?

CodePudding user response:

Assuming that each value of 'coordinates' consists of one list containing two lists with two values, you can use something like this:

df = pd.DataFrame({
    "coordinates": [
      [[36.2046069345455, 23.466756], [56.678766, 45.1405656576776]],
      [[46.2034534576765, 56.877879], [34.207049, 18.1565655652422]],
      [[41.3223449567164, 34.645445], [78.206545, 66.1402362184811]],
      [[23.2046069887887, 87.234223], [76.212123, 15.3943493949348]],
      [[33.9685958954948, 78.454555], [32.765666, 23.4685489900090]],
      [[12.7665776555654, 45.987878], [43.787786, 45.3494893404820]]
]})



pd.concat([df.rename(columns={'coordinates': f'coordinates_{i}{j}'})[f'coordinates_{i}{j}'].str[i].str[j] for i in [0, 1] for j in [0, 1]], axis=1)

------------------------------------------------------------------------
    coordinates_00  coordinates_01  coordinates_10  coordinates_11
0   36.204607       23.466756       56.678766       45.140566
1   46.203453       56.877879       34.207049       18.156566
2   41.322345       34.645445       78.206545       66.140236
3   23.204607       87.234223       76.212123       15.394349
4   33.968596       78.454555       32.765666       23.468549
5   12.766578       45.987878       43.787786       45.349489
------------------------------------------------------------------------

Alternative solution that is even shorter and uses .apply:

from itertools import chain

pd.DataFrame(df['coordinates'].apply(lambda x: list(chain.from_iterable(x))).to_dict()).T

You can then just rename the columns as you want.

CodePudding user response:

Probably not the prettiest solution but this should do the trick:

import pandas as pd

coordinates = [
    [[36.2046069345455, 23.466756], [56.678766, 45.1405656576776]],
    [[46.2034534576765, 56.877879], [34.207049, 18.1565655652422]],
    [[41.3223449567164, 34.645445], [78.206545, 66.1402362184811]],
    [[23.2046069887887, 87.234223], [76.212123, 15.3943493949348]],
    [[33.9685958954948, 78.454555], [32.765666, 23.4685489900090]],
    [[12.7665776555654, 45.987878], [43.787786, 45.3494893404820]]]

df = pd.DataFrame({"coordinates": coordinates})
df[["c1", "c2"]] = pd.DataFrame(pd.DataFrame(df["coordinates"].to_list(), columns=['c12', 'c34'])['c12'].to_list(), columns=['c1', 'c2'])
df[["c3", "c4"]] = pd.DataFrame(pd.DataFrame(df["coordinates"].to_list(), columns=['c12', 'c34'])['c34'].to_list(), columns=['c3', 'c4'])
del df["coordinates"]

print(df)
>           c1         c2         c3         c4
  0  36.204607  23.466756  56.678766  45.140566
  1  46.203453  56.877879  34.207049  18.156566
  2  41.322345  34.645445  78.206545  66.140236
  3  23.204607  87.234223  76.212123  15.394349
  4  33.968596  78.454555  32.765666  23.468549
  5  12.766578  45.987878  43.787786  45.349489
  • Related