I have been working on this project for a couple months - taking some generated xls and xlsx documents and using a combo of the csv module and pandas (python) to rearrange the whole order of the data so it will be appropriate for manual upload to a system that requires a certain data order for correct import.
no stress. There are several different documents with their own original structure as well as many templates for the import. Besides the data rearranging, I have also needed to add some internal codes to some of the documents that we need for local student work management and rename some columns to match the import requirements. I have all this working, but all my efforts to code something that takes student names and instructor names, currently listed as [lastname,firstname], NEED to be [lastname, firstname] with a SPACE added in the name after the comma before the first line.
I have messed around with a for loop using regex and something as simple as
df.replace(',', ', ', regex=True)
OR
df["Column Name"].str.replace(',', ', ') (which I think is way wrong, but tried it anyway)
What else might I try to accomplish what I would think of as being simple. Nothing seems to be working. I am running the script, I get no errors, and yet this change is not being made. I have looked all over stackoverflow, but am not having success.
thank you in advance
CodePudding user response:
You could use a regular expression to ensure there is always just one space:
import pandas as pd
import re
data = [['flintstone,fred'], ['flintstone, wilma'], ['rubble, barney']]
df = pd.DataFrame(data, columns=['Name'])
df['Name'] = df['Name'].str.replace(', *', ', ', regex=True)
print(df)
Giving you:
Name
0 flintstone, fred
1 flintstone, wilma
2 rubble, barney
CodePudding user response:
Replace will return a new dataframe, if you are doing other operation on that df
try:
df = df.replace(',', ', ')
Otherwise try to add more line of your code to show what you do after the replace operation.