Home > Enterprise >  How to store the basenames of multiple paths in a column in pandas
How to store the basenames of multiple paths in a column in pandas

Time:11-29

I have a pandas dataframe as such :


Term.            DocFreq.  TermFreq.  Ngram.  Filenames

witness says     1        1          2       '/Users/KieraKatsalapov/Desktop//LuceneIndexing/Docs/cnnValBartCnnDocs/doc657.txt'
witness says of  2        2          3       '/Users/KieraKatsalapov/Desktop/LuceneIndexing/Docs/cnnValBartCnnDocs/doc192.txt,/Users/KieraKatsalapov/Desktop/LuceneIndexing/Docs/cnnValBartCnnDocs/doc153.txt'
.
.
.

I need to convert the filenames to the basenames. I know I can do this using

df['Filenames'] = df['Filenames'].apply(os.path.basenames)

But this converts only the last filename. For example, it will convert the filenamne in the 2nd entry directly to "doc153.txt".

Whereas, I need it to be - "doc192.txt, doc153.txt"

I am assuming I need to use the lambda function that will take in the whole filename value and return the output containing multiple filenames. But I don't know how to proceed.

Please help.

CodePudding user response:

You can split values by , and for each value call os.path.basename, last join back by ,:

df['Filenames'] = df['Filenames'].apply(lambda x:','.join(os.path.basename(y) for y in x.split(',')))
  • Related