Home > Software design >  use Pandas to drop values from csv
use Pandas to drop values from csv

Time:04-07

I have a csv that I want to use to search an api for data, but the row which stores the data used for the api search can contain a second value separated by ; like this:

2 Jan Rohls Kunst und Religion zwischen Mittelalter und Barock : von Dante bis Bach -
3 Karl-Markus Ritter    Der Dom zu Speyer : weltliche Macht und christlicher Glaube 9783806241280; 3806241287
4 Hape Kerkeling    Ich bin dann mal weg : meine Reise auf dem Jakobsweg    9783890296005; 3890296009

I want to remove the values after and including ; so that only the 9783XXX remains in the row ISBN. It would be perfect, if the code could be included in the existing process, but I haven't found a way how to manage that.

Here's the full extracted csv: https://pastebin.com/frAK9NBG

These are some scripts I use to

a) CLean a pre-existing csv:

import pandas as pd
import glob
import os
path= "./**/extract/"

filelist=glob.glob('./**/Reihe A/Reihe*.csv',recursive=True)
print(filelist)
for file in filelist:
    data=pd.read_csv(file, sep="\t",  encoding='utf8')
    data.columns
    title=[] #this gets all the titles
    for row in data['Titel']:
        title.append(row)
    author=[] #this gets all the authors
    for row in data['Verfasser']:
        author.append(row)
    isbn=[] #this gets all the isbns
    for row in data['ISBN']:
        isbn.append(row)
    df=pd.DataFrame({'Verfasser': author, 'Titel': title, 'ISBN': isbn}) #create new csvs based on extracted data
#save csv in set path
df.to_csv(path   file  "_"  "extract", sep='\t', encoding='utf8')
#df.to_csv(path   file  '_'   'extract.csv', sep="\t") #add endpath ,  encoding='utf-8' back if needed
print(file  '_'   'extract.csv'   ' saved to '   path)

and to

b) search the api

import glob
import pandas as pd
from urllib.request import urlopen
#import generated csvs from other script
filelist=glob.glob('./**/Reihe*_extract.csv',recursive=True)
print(filelist)
for file in filelist:
    #read csv, make a list of all isbns
    data=pd.read_csv(file, sep="\t",  encoding='utf8')
    isbnlist=[]
    for row in data['ISBN']:
        isbnlist.append(row)
    #for each isbn in list, get data from api
    for isbn in isbnlist:
        url = 'http://sru.k10plus.de/gvk!rec=1?version=1.1&operation=searchRetrieve&query=pica.isb=[isbn]&maximumRecords=10&recordSchema=marcxml'  
    url = url.replace('[isbn]', isbn)
    print(url)

as of right now, I'm splitting my codes based on their function (clean csv, search api etc.) but I plan to make a launcher script that does all for the end user.

Any help is appreciated.

CodePudding user response:

Try this, split by seperator and keep wanted split:

data['ISBN'] = [x.split(' ')[0] for x in data['ISBN']] #keeps first portion of split.
  • Related