Home > Enterprise >  split text data into row data records using python
split text data into row data records using python

Time:12-15

As I am new to python as I am trying to split the text data and convert into as row records. suppose I have 100 records as need i need to split into as 1-7 is one column,8-11 is second column,12-15 is third column and 16-25 is fourth column etc.. Can anyone help me that issue.

example text format :

1. animals120 redlivinginjungle
2. worldis2021 skybluecolour    

The above 2 records are the example to split into the data as row.

Output format :

1. animals  120  red  living  injungle
2. worldis  202  sky  blue    colour

CodePudding user response:

Use the .str accessor

column_splits = {'first': [0, 7], 'second': [7, 10]}

for column, limits in column_splits.items():
    start, end = limits
    df[column] = df['your_column'].str[start: end]

CodePudding user response:

You can combine itertools.tee and zip_longest

Function to split:

from itertools import tee, zip_longest

def split_by_index(s):
  indices = [0,7,10,14,20]
  start, end = tee(indices)
  next(end)
  return " ".join([s[i:j] for i,j in zip_longest(start, end)])

You data:

import pandas as pd

df = pd.DataFrame()
df["sentence"] = ["animals120 redlivinginjungle",
                  "animals140 redlivinginjungle",
                  "animals160 redlivinginjungle"]


    sentence
0   animals120 redlivinginjungle
1   animals140 redlivinginjungle
2   animals160 redlivinginjungle

Then apply function to create new dataframe:

new_df = df["sentence"].apply(split_by_index).str.split(expand=True)

Output

print(new_df)

    0       1   2   3       4
0   animals 120 red living  injungle
1   animals 140 red living  injungle
2   animals 160 red living  injungle
  • Related