Home > Mobile >  How to split a list column and add them as new column values in polars dataframe?
How to split a list column and add them as new column values in polars dataframe?

Time:10-27

I have a data frame as below.

pl.DataFrame({'combine_address':[ ["Yes|#456 Lane|Apt#4|ABC|VA|50566", "Yes|#456 Lane|Apt#4|ABC|VA|50566", "No|#456 Lane|Apt#4|ABC|VA|50566"],
                                 ["No|#8495|APT#94|SWE|WA|43593", "No|#8495|APT#94|SWE|WA|43593", "Yes|#8495|APT#94|SWE|WA|43593"]
                                ]})

Here combine address is a list type column which has elements with about 6 pipe(|) values, Here i would like to apply a split on each element with an separator(|) in a list.

Here is the expected output:

enter image description here

If a list has 3 elements the splitted columns will be 3*6=18

If a list has 5 elements the splitted columns will be 5*6=30 and so on so forth.

CodePudding user response:

import polars as pl
import pandas as pd
df=pd.DataFrame({'combine_address':[ ["Yes|#456 Lane|Apt#4|ABC|VA|50566", "Yes|#456 Lane|Apt#4|ABC|VA|50566", "No|#456 Lane|Apt#4|ABC|VA|50566"],
                                 ["No|#8495|APT#94|SWE|WA|43593", "No|#8495|APT#94|SWE|WA|43593", "Yes|#8495|APT#94|SWE|WA|43593"]
                                ]})

The above is the original code. Then, you can try the following below.

a=[]
for i in range(len(df['combine_address'])):
    a =[j.split('|') for j in df['combine_address'][i]]
b=[]
for i in range(len(a)):
    b =a[i]

and you will get a list with 36 elements.

c=pd.DataFrame(b).T
pl.from_pandas(c)

This is like your expected output. shape:(1,36)

I hope this will help you.

CodePudding user response:

Is this what you are looking for?



df = pl.DataFrame({"combine_address":[
    ["Yes|#456 Lane|Apt#4|ABC|VA|50566", "Yes|#456 Lane|Apt#4|ABC|VA|50566", "No|#456 Lane|Apt#4|ABC|VA|50566"],
    ["No|#8495|APT#94|SWE|WA|43593", "No|#8495|APT#94|SWE|WA|43593", "Yes|#8495|APT#94|SWE|WA|43593"]
]})

(df.select(
    pl.col("combine_address").reshape((1, -1))
    .arr.join("|").str.split("|")
    .arr.to_struct(n_field_strategy="max_width")
).unnest("combine_address"))
shape: (1, 36)
┌─────────┬───────────┬─────────┬─────────┬─────┬──────────┬──────────┬──────────┬──────────┐
│ field_0 ┆ field_1   ┆ field_2 ┆ field_3 ┆ ... ┆ field_32 ┆ field_33 ┆ field_34 ┆ field_35 │
│ ---     ┆ ---       ┆ ---     ┆ ---     ┆     ┆ ---      ┆ ---      ┆ ---      ┆ ---      │
│ str     ┆ str       ┆ str     ┆ str     ┆     ┆ str      ┆ str      ┆ str      ┆ str      │
╞═════════╪═══════════╪═════════╪═════════╪═════╪══════════╪══════════╪══════════╪══════════╡
│ Yes     ┆ #456 Lane ┆ Apt#4   ┆ ABC     ┆ ... ┆ APT#94   ┆ SWE      ┆ WA       ┆ 43593    │
└─────────┴───────────┴─────────┴─────────┴─────┴──────────┴──────────┴──────────┴──────────┘

  • Related