I have a pandas dataframe as such :
Term. DocFreq. TermFreq. Ngram. Filenames
witness says 1 1 2 '/Users/KieraKatsalapov/Desktop//LuceneIndexing/Docs/cnnValBartCnnDocs/doc657.txt'
witness says of 2 2 3 '/Users/KieraKatsalapov/Desktop/LuceneIndexing/Docs/cnnValBartCnnDocs/doc192.txt,/Users/KieraKatsalapov/Desktop/LuceneIndexing/Docs/cnnValBartCnnDocs/doc153.txt'
.
.
.
I need to convert the filenames to the basenames. I know I can do this using
df['Filenames'] = df['Filenames'].apply(os.path.basenames)
But this converts only the last filename. For example, it will convert the filenamne in the 2nd entry directly to "doc153.txt".
Whereas, I need it to be - "doc192.txt, doc153.txt"
I am assuming I need to use the lambda function that will take in the whole filename value and return the output containing multiple filenames. But I don't know how to proceed.
Please help.
CodePudding user response:
You can split values by ,
and for each value call os.path.basename
, last join back by ,
:
df['Filenames'] = df['Filenames'].apply(lambda x:','.join(os.path.basename(y) for y in x.split(',')))