Home > Net >  creating a list from a column with multiple lines
creating a list from a column with multiple lines

Time:05-09

I have a Pandas data frame that in one column called SourceDocument I have multiple lines of data in each cell (separated by \n).

SourceDocuments

PRDS-002039\nPRDS-001952\nPRDS-001956

I would like to run a for loop that reads each row and then separates these lines into a list. Eventually, I wanna have a dictionary where the value is the list of split items. for example:

SourceID

546785: ['PRDS-002039','PRDS-001952','PRDS-001956']

The dict keys(546785) are generated through another for loop I wrote the below code but can't figure out how to do the split row-by-row

valuez=[]    
for j in range (0,ABP215.shape[0]):
       valuez.append(ABP215['SourceDocuments'][j].split('\n'))

APB215 is the Pandaas dataframe name.

I get this error:

AttributeError: 'float' object has no attribute 'split'

any help would be appreciated.

CodePudding user response:

Thanks, Everyone for their help, and my apologies for not being straight clear on my Q. Here is the answer put together with the community's help.

SourceDocumentID=np.arange(1001,1001 ABP215.shape[0],1)
SourceDocumentID=list(SourceDocumentID)

keyz=[]
for i in range(0,len(SourceDocumentID)): # building 
  keyz.append(SourceDocumentID[i])
valuez=[]
for j in range (0,ABP215.shape[0]):
  valuez.append(ABP215['Source Document'].apply(str)[j].split('\n'))
sourcedocuments={k:v for k,v in zip(keyz,valuez) }

SourceDoc=pd.DataFrame.from_dict(sourcedocuments,orient='index')
  • Related