creating a list from a column with multiple lines-CodePudding

I have a Pandas data frame that in one column called SourceDocument I have multiple lines of data in each cell (separated by \n).

SourceDocuments

PRDS-002039\nPRDS-001952\nPRDS-001956

I would like to run a for loop that reads each row and then separates these lines into a list. Eventually, I wanna have a dictionary where the value is the list of split items. for example:

SourceID

546785: ['PRDS-002039','PRDS-001952','PRDS-001956']

The dict keys(546785) are generated through another for loop I wrote the below code but can't figure out how to do the split row-by-row

valuez=[]    
for j in range (0,ABP215.shape[0]):
       valuez.append(ABP215['SourceDocuments'][j].split('\n'))

APB215 is the Pandaas dataframe name.

I get this error:

AttributeError: 'float' object has no attribute 'split'

any help would be appreciated.

CodePudding user response：

Thanks, Everyone for their help, and my apologies for not being straight clear on my Q. Here is the answer put together with the community's help.

SourceDocumentID=np.arange(1001,1001 ABP215.shape[0],1)
SourceDocumentID=list(SourceDocumentID)

keyz=[]
for i in range(0,len(SourceDocumentID)): # building 
  keyz.append(SourceDocumentID[i])
valuez=[]
for j in range (0,ABP215.shape[0]):
  valuez.append(ABP215['Source Document'].apply(str)[j].split('\n'))
sourcedocuments={k:v for k,v in zip(keyz,valuez) }

SourceDoc=pd.DataFrame.from_dict(sourcedocuments,orient='index')