As I am new to python as I am trying to split the text data and convert into as excel columns and row records. suppose I have 100 records as I need to split into as 1-7 is one column,8-8 is second column,9-10 is third column and 11-18 is fourth column, 5th column is 19-24,6th column is 25-124,7th column is 125-1000. The below example records are in text.txt. I want to convert into excel file based on the above mentioned characters. can anyone help me would be appreciated.
example text format :
9999999M0210012021454 Copyright 2021 National Council for Prescription Drug Programs, All Rights Reserved
00301ABS LLC SO CAL AND IMW P O BOX 742382 LOS ANGELES CA9007423822083953954 6232823834 820184434 KATHY GIANNAKOPOULOS MGR, 3RD PARTY [email protected] MICHAEL MOLLSEN DIRECTOR, MANAGED CARE [email protected] JESSICA WILTS SR MGR, MANAGED CARE [email protected] MARC ALLGOOD PHARMACY SYSTEMS DIRECTOR [email protected] JUDEE OLIMPO MANAGER, 3RD PARTY AUDIT & [email protected] 0003640503199600000000
00801CVS PHARMACY INC 1 CVS DRIVE BOX 1075 WOONSOCKET RI02895 4017651500 4017707108 SUSAN COLBERT DIRECTOR, PAYER RELATIONS [email protected] ANTHONY GRATTO MANAGER, PAYER RELATIONS [email protected] 0000340101200100000000
01101THE BARTELL DRUG COMPANY 4025 DELRIDGE WAY SW STE 400 SEATTLE WA9810612737179755937 7179758659 910138195 JENNIFER ZOREK DIRECTOR [email protected] 0002571218202000000000
The above records are the example to split into the data as row.
Example Output format :
0 1 2 3 4 5 6
Headers 1. 9999 m 01 10012021
Rows 2. ------below is the records-------------
3. ---------------------------------------
CodePudding user response:
Use the .str
accessor
column_splits = {'first': [0, 7], 'second': [7, 10]}
for column, limits in column_splits.items():
start, end = limits
df[column] = df['your_column'].str[start: end]
CodePudding user response:
You can combine itertools.tee
and zip_longest
Function to split:
from itertools import tee, zip_longest
def split_by_index(s):
indices = [0,7,10,14,20]
start, end = tee(indices)
next(end)
return " ".join([s[i:j] for i,j in zip_longest(start, end)])
You data:
import pandas as pd
df = pd.DataFrame()
df["sentence"] = ["animals120 redlivinginjungle",
"animals140 redlivinginjungle",
"animals160 redlivinginjungle"]
sentence
0 animals120 redlivinginjungle
1 animals140 redlivinginjungle
2 animals160 redlivinginjungle
Then apply function to create new dataframe:
new_df = df["sentence"].apply(split_by_index).str.split(expand=True)
Output
print(new_df)
0 1 2 3 4
0 animals 120 red living injungle
1 animals 140 red living injungle
2 animals 160 red living injungle