I have two rows of data in row 4 and 5. Row 4 has the titles for the data and row 5 holds the actual data. I want to go ahead and sort them out in any sort of format. I am completely new to python so I don't even know where to start. Its a csv file and I want a output of a csv file as well. This is what the data looks like:
A | B | C | D | A | B | C | D | A | B | C | D |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
I would like data to look something like this if possible:
A | B | C | D |
---|---|---|---|
0 | 1 | 2 | 3 |
4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 |
So I want to sort it out by the titles but since the row is not a header row I dont know what to do. Again the titles "A" "B" "C" "D" are in row 4 and the data 0,1,2,3.... are in row 5. Any help would be appreciated.
CodePudding user response:
You can use pandas
to read the csv file and then use pandas.DataFrame
to sort the data. Here is a sample code:
import pandas as pd
df = pd.read_csv('file.csv', header=None)
df.columns = df.iloc[3]
df = df.sort_values(by=['A', 'B', 'C', 'D'])
df.to_csv('output.csv', index=False)
CodePudding user response:
You can use a dictionary to store the original data, using the first row as the dictionary keys. Then you can use panda to create your final csv file. Something like this:
from collections import defaultdict
import pandas
# read the two rows
with open('data.txt') as ifile:
headers = [name.strip() for name in ifile.readline().split(",")]
values = [int(value.strip()) for value in ifile.readline().split(",")]
# use a dictionary to store the data, using the
# names in firt row as dictionary keys
dd = defaultdict(lambda: [])
for name, val in zip(headers, values):
dd[name].append(val)
# use pandas package to create the csv
data_frame = pandas.DataFrame.from_dict(dd)
data_frame.to_csv("final.csv", index=False)
I am assuming that your data.txt
file contains:
A,B,C,D,A,B,C,D,A,B,C,D
0,1,2,3,4,5,6,7,8,9,10,11