Home > Blockchain >  Split dataframe column into equal sublengths and return the new dataframe
Split dataframe column into equal sublengths and return the new dataframe

Time:04-11

I have a dataframe such as below and I want to split the string column into rows each with an equal string of 4 characters.

date, string
2002-06-01, 12345678
2002-06-02, 87654321

Expected Output

date, string
2002-06-01, 1234
2002-06-01, 5678
2002-06-02, 8765
2002-06-02, 4321

I have tried the example given here: Split cell into multiple rows in pandas dataframe

from itertools import chain

def chainer(s):
    return list(chain.from_iterable(s.str.split(df['string'], 4)))

lens = df['string'].str.split(df['string'], 4).map(len)
res = pd.DataFrame({'date': np.repeat(df['date'], lens), 'string': chainer(df['string'])})

But I get the error: TypeError: unhashable type: 'Series'. How can I fix this issue.

CodePudding user response:

Exlplode Chunks

df.assign(
    string=[
        [x[i:i 4] for i in range(0, len(x), 4)]
         for x in df.string]
).explode('string')

         date string
0  2002-06-01   1234
0  2002-06-01   5678
1  2002-06-02   8765
1  2002-06-02   4321

CodePudding user response:

Here is another way:

df.assign(string = df['string'].str.findall(r'\d{4}')).explode('string')

Output:

       date string
0  6/1/2002   1234
0  6/1/2002   5678
1  6/2/2002   8765
1  6/2/2002   4321
  • Related