I have a CSV type file that has the header and value in the same segment like so:
"field1=value1", "field2=value2", etc
Is there a way to pull the fields to a header and the values as rows beneath? I currently have:
df = pd.read_csv(file, sep='=')
df.to_csv(file ".csv", index=None)
The output to the above is:
field1=value1, field2 value2, etc
The output I desire:
field1, field2, etc
value1, value2, etc
value, value, etc
Thank you in advance!
CodePudding user response:
You could try something like this:
import csv
data = []
fieldnames = set()
with open('input.csv') as f_input:
csv_input = csv.reader(f_input, skipinitialspace=True)
for row in csv_input:
d_row = {}
for entry in row:
if entry: # skip empty entries
key, value = entry.split('=')
d_row[key] = value
fieldnames.add(key)
data.append(d_row)
with open('output.csv', 'w', newline='') as f_output:
csv_output = csv.DictWriter(f_output, fieldnames=sorted(fieldnames))
csv_output.writeheader()
csv_output.writerows(data)
or to create a dataframe:
df = pd.DataFrame(data, columns=sorted(fieldnames))
print(df)
This reads each row using a standard csv.reader()
and then splits each field into key value pairs and stored into a row dictionary which are then added to data
. A set is used to learn all the fieldnames being used.
So if your input CSV file is:
"field1=value1","field2=value2"
"field1=value3","field2=value4"
The output.csv
file would be:
field1,field2
value1,value2
value3,value4