I have a Pandas data frame that in one column called SourceDocument I have multiple lines of data in each cell (separated by \n).
SourceDocuments
PRDS-002039\nPRDS-001952\nPRDS-001956
I would like to run a for loop that reads each row and then separates these lines into a list. Eventually, I wanna have a dictionary where the value is the list of split items. for example:
SourceID
546785: ['PRDS-002039','PRDS-001952','PRDS-001956']
The dict keys(546785) are generated through another for loop I wrote the below code but can't figure out how to do the split row-by-row
valuez=[]
for j in range (0,ABP215.shape[0]):
valuez.append(ABP215['SourceDocuments'][j].split('\n'))
APB215 is the Pandaas dataframe name.
I get this error:
AttributeError: 'float' object has no attribute 'split'
any help would be appreciated.
CodePudding user response:
Thanks, Everyone for their help, and my apologies for not being straight clear on my Q. Here is the answer put together with the community's help.
SourceDocumentID=np.arange(1001,1001 ABP215.shape[0],1)
SourceDocumentID=list(SourceDocumentID)
keyz=[]
for i in range(0,len(SourceDocumentID)): # building
keyz.append(SourceDocumentID[i])
valuez=[]
for j in range (0,ABP215.shape[0]):
valuez.append(ABP215['Source Document'].apply(str)[j].split('\n'))
sourcedocuments={k:v for k,v in zip(keyz,valuez) }
SourceDoc=pd.DataFrame.from_dict(sourcedocuments,orient='index')