I have a dataframe such as below and I want to split the string column into rows each with an equal string of 4 characters.
date, string
2002-06-01, 12345678
2002-06-02, 87654321
Expected Output
date, string
2002-06-01, 1234
2002-06-01, 5678
2002-06-02, 8765
2002-06-02, 4321
I have tried the example given here: Split cell into multiple rows in pandas dataframe
from itertools import chain
def chainer(s):
return list(chain.from_iterable(s.str.split(df['string'], 4)))
lens = df['string'].str.split(df['string'], 4).map(len)
res = pd.DataFrame({'date': np.repeat(df['date'], lens), 'string': chainer(df['string'])})
But I get the error: TypeError: unhashable type: 'Series'. How can I fix this issue.
CodePudding user response:
Exlplode Chunks
df.assign(
string=[
[x[i:i 4] for i in range(0, len(x), 4)]
for x in df.string]
).explode('string')
date string
0 2002-06-01 1234
0 2002-06-01 5678
1 2002-06-02 8765
1 2002-06-02 4321
CodePudding user response:
Here is another way:
df.assign(string = df['string'].str.findall(r'\d{4}')).explode('string')
Output:
date string
0 6/1/2002 1234
0 6/1/2002 5678
1 6/2/2002 8765
1 6/2/2002 4321