Home > front end >  Explode array [(str), (int)] in column dataframe pandas
Explode array [(str), (int)] in column dataframe pandas

Time:03-10

I have a dataframe:

    df = pd.DataFrame({
       'day': ['11', '12'],
       'City': ['[(Mumbai, 1),(Bangalore, 2)]', '[(Pune, 3),(Mumbai, 4),(Delh, 5)]']
    })

   day                               City
0  11       [(Mumbai, 1),(Bangalore, 2)]
1  12  [(Pune, 3),(Mumbai, 4),(Delh, 5)]

I want to make an explode. But when I do that, nothing changes.

df2 = df.explode('City')

What I want to get at the output

  day            City
0  11     (Mumbai, 1)
1  11  (Bangalore, 2)
2  12       (Pune, 3)
3  12     (Mumbai, 4)
4  12       (Delh, 5)

CodePudding user response:

You can explode strings. You need to find a way to convert to lists.

Assuming you have city names with only letters (or spaces) you could use a regex to add the quotes and convert to list with ast.literal_eval:

from ast import literal_eval

df['City'] = (df['City']
              .str.replace(r'([a-zA-Z ] ),', r'"\1",', regex=True)
              .apply(literal_eval)
              )

df2 = df.explode('City', ignore_index=True)

output:

  day            City
0  11     (Mumbai, 1)
1  11  (Bangalore, 2)
2  12       (Pune, 3)
3  12     (Mumbai, 4)
4  12       (Delh, 5)

CodePudding user response:

df = pd.DataFrame({
    'day': ['11', '12'],
    'City': ['[(Mumbai, 1),(Bangalore, 2)]', '[(Pune, 3),(Mumbai, 4),(Delh, 5)]']
})


df['City'] = [re.sub("\),\(",")-(", x) for x in df['City']]
df['City'] = [re.sub("\[|\]|\(|\)","", x) for x in df['City']]
df['City'] = [x.split("-") for x in df['City']]
df['City']
df2 = df.explode('City').reset_index(drop=True)

you have to process the string and convert it to list before explode

  day          City
0  11     Mumbai, 1
1  11  Bangalore, 2
2  12       Pune, 3
3  12     Mumbai, 4
4  12       Delh, 5
  • Related