I have a CSV file with two rows of data. The first row are the names like Red, Green, Orange, Purple and they repeat themselves like that. The second row is the data. The format is how I have written below, but in a CSV file. I want to take this and put them in separate columns like I have shown in table 2 but again in a CSV file. How do I combine the similar names and keep the data for it all? I understand I could write them out like this
lista1=["Red", "Green", "Orange", "Purple"]
lista2=[3,56,23,12,34,65,98,7,9,45,33,15]
and call on them although I have 100's of files like this and I can't change the numbers and titles each time
Table 1:
Red | Green | Orange | Purple | Red | Green | Orange | Purple | Red | Green | Orange | Purple |
---|---|---|---|---|---|---|---|---|---|---|---|
3 | 56 | 23 | 12 | 34 | 65 | 98 | 7 | 9 | 45 | 33 | 15 |
Table 2 (output):
Red | Green | Orange | Purple |
---|---|---|---|
3 | 56 | 23 | 12 |
34 | 65 | 98 | 7 |
9 | 45 | 33 | 15 |
Again the table 1 data is from CSV file and I want the desired output in a CSV file as well.
CodePudding user response:
Since, you do not need pandas
in your solution, here is one that only uses csv
module.
I read the file using csv.reader()
function. Converted the data into dictionary according to the sample input csv file you provided and then converted that dictionary into csv file.
Here is the sample csv input file :-
Red,Green,Orange,Purple,Red,Green,Orange,Purple,Red,Green,Orange,Purple 3,56,23,12,34,65,98,7,9,45,33,15
Now the code:-
import csv
with open('try.csv') as csvfile:
mixedData = csv.reader(csvfile)
column,data = mixedData
data_dict = {}
for i,name in enumerate(column) :
if name in data_dict :
data_dict[name].append(data[i])
else :
data_dict[name] = [data[i]]
with open("try_output.csv", "w",newline="") as outfile:
writer = csv.writer(outfile)
writer.writerow(data_dict.keys())
writer.writerows(zip(*data_dict.values()))
output file :-
Red,Green,Orange,Purple 3,56,23,12 34,65,98,7 9,45,33,15
CodePudding user response:
This is how I would do it. This assumes that the first row is always going to be words of some kind, and the bottom will always be numbers. As long as that's true, you don't need to know what the words are ahead of time.
First, read the data from the csv file. (I'm not reading it directly into a dataframe because column names need to be unique.)
>>> import pandas as pd
>>> import re
>>> infile = '/path/to/sample.csv'
>>> f = open(infile, 'r')
>>> text = f.read()
>>> print(text)
Red,Green,Orange,Purple,Red,Green,Orange,Purple,Red,Green,Orange,Purple
3,56,23,12,34,65,98,7,9,45,33,15
Then separate out your words and numbers, using regex:
>>> words = re.findall("[a-zA-Z] ", text)
>>> numbers = re.findall("[0-9] ", text)
>>> print(words)
>>> print(numbers)
Create your dataframe:
>>> df = pd.DataFrame({
... "Words": words,
... "Numbers": numbers
... })
>>> print(df)
Words Numbers
0 Red 3
1 Green 56
2 Orange 23
3 Purple 12
4 Red 34
5 Green 65
6 Orange 98
7 Purple 7
8 Red 9
9 Green 45
10 Orange 33
11 Purple 15
Group the words together. (This seems like a convoluted way of doing it, but I couldn't figure out a simpler one.)
>>> words_no_repeats = list(set(words))
>>> new_df = pd.DataFrame()
>>> for w in words_no_repeats:
... values = df[df['Words']==w]['Numbers'].to_list()
... temp_df = pd.DataFrame({w: values}, index=range(len(values)))
... new_df = pd.concat([new_df, temp_df], axis=1)
print(new_df)
Orange Green Red Purple
0 23 56 3 12
1 98 65 34 7
2 33 45 9 15
Then save your new dataframe as a csv:
>>> new_df.to_csv('/path/to/new_sample.csv', index=False)
This is what the csv file looks like:
Orange,Green,Red,Purple
23,56,3,12
98,65,34,7
33,45,9,15
I know you said in your comments that you were trying to avoid Pandas, but I don't know of any other way to do the grouping.