Home > Blockchain >  Names with commas needs spaces added before first name
Names with commas needs spaces added before first name

Time:11-30

I have been working on this project for a couple months - taking some generated xls and xlsx documents and using a combo of the csv module and pandas (python) to rearrange the whole order of the data so it will be appropriate for manual upload to a system that requires a certain data order for correct import.

no stress. There are several different documents with their own original structure as well as many templates for the import. Besides the data rearranging, I have also needed to add some internal codes to some of the documents that we need for local student work management and rename some columns to match the import requirements. I have all this working, but all my efforts to code something that takes student names and instructor names, currently listed as [lastname,firstname], NEED to be [lastname, firstname] with a SPACE added in the name after the comma before the first line.

I have messed around with a for loop using regex and something as simple as

df.replace(',', ', ', regex=True) 

OR

df["Column Name"].str.replace(',', ', ') (which I think is way wrong, but tried it anyway)

What else might I try to accomplish what I would think of as being simple. Nothing seems to be working. I am running the script, I get no errors, and yet this change is not being made. I have looked all over stackoverflow, but am not having success.

thank you in advance

CodePudding user response:

You could use a regular expression to ensure there is always just one space:

import pandas as pd
import re

data = [['flintstone,fred'], ['flintstone, wilma'], ['rubble,     barney']]
df = pd.DataFrame(data, columns=['Name'])
df['Name'] = df['Name'].str.replace(', *', ', ', regex=True)

print(df)

Giving you:

                Name
0   flintstone, fred
1  flintstone, wilma
2     rubble, barney

CodePudding user response:

Replace will return a new dataframe, if you are doing other operation on that df try:

df = df.replace(',', ', ')

Otherwise try to add more line of your code to show what you do after the replace operation.

  • Related