Home > Mobile >  How to unnest a date column and a related column together in Pandas?
How to unnest a date column and a related column together in Pandas?

Time:10-15

I have a dataframe containing two columns I would like to explode / unnest together. One contains dates, the other contains information related to the dates.

here is what the initial df looks like:

enter image description here

data = [
    ["ABC", 2002, ["AB", "AB", "EF"], ["2002-05-06", "2002-05-07", "2002-05-12"]],
    ["DEF", 2002, [["CD", "EF"]], ["2002-06-12", "2002-06-13"]],
    ["GHI", 2002, [["JK"]], ["2002-03-02"]],
    ["JKL", 2002, [[]], ["2002-03-02"]],
]

df = pd.DataFrame(data, columns=["ID", "year", "list", "date_list"])
df

what I want it to like is, such that the date variables and relevant list elements are unpacked together:

enter image description here

data = [
    ["ABC", 2002, ["AB"], ["2002-05-06"]],
    ["ABC", 2002, ["AB"], ["2002-05-07"]],
    ["ABC", 2002, ["EF"], ["2002-05-12"]],
    ["DEF", 2002, ["CD"], ["2002-06-12"]],
    ["DEF", 2002, ["EF"], ["2002-06-13"]],
    ["GHI", 2002, [["JK"]], ["2002-03-02"]],
    ["JKL", 2002, [[]], ["2002-03-02"]],
]

df = pd.DataFrame(data, columns=["ID", "year", "list", "date_list"])
df

I have tried exploding both the list and date_list columns individually and separately, but I am unaware of a way to unnest them together in an ordered fasion. Does anyone know how to do this?

CodePudding user response:

If I understood you correctly:

extracted = df['list'].explode().to_frame().reset_index(drop=True).join(df['date_list'].explode().reset_index())
df = df[['ID', 'year']].merge(extracted[['list', 'date_list', 'index']], left_index=True, right_on='index').drop(columns=['index'])

Output:

    ID  year list   date_list
0  ABC  2002   AB  2002-05-06
1  ABC  2002   AB  2002-05-07
2  ABC  2002   EF  2002-05-12
3  DEF  2002   CD  2002-06-12
4  DEF  2002   EF  2002-06-13
5  GHI  2002   JK  2002-03-02
6  JKL  2002  NaN  2002-03-02
  • Related