I have this dataframe in a csv that it's received like this:
And my goal here is to incorporate the second row in the headers row to the first blank cell to the right, in a way that my end result would be like this:
quite confuse in how to create a solution for this, any thoughts in how I could tackle this issue?
import csv
import pandas as pd
csv_file_name= 'test.csv'
file = open(csv_file_name)
csvreader = csv.reader(file)
df = pd.DataFrame(csvreader)
df.head(10)
0 1 2 3 4 5 6 7 8 9 ... 17 18 19 20 21 22 23 24 25 26
0 "Scenario" Annee Version Client Destination Source Activite Organisation Periode Currency ... None None None None None None None None None None
1 GMVE_STE GMVE7190 GMVE_ICO GMVE_ GMVI_ GMV_ GMA_ RCH RBC TFP ... None None None None None None None None None None
2 F1_U FY22 VD Client BR_IFRS GLOBAL_SOURCE U10 0898_ JAN KEURO ... 4.998999999999796 4.998999999999796
3 F1_U FY22 VD Client BR_IFRS GLOBAL_SOURCE U10 0898_ FEB KEURO ... 40.29999999999927 40.29999999999927
4 F1_U FY22 VD Client BR_IFRS GLOBAL_SOURCE U10 0898_ MAR KEURO ... -19.26100000000042 -19.26100000000042
5 F1_U FY22 VD Client BR_IFRS GLOBAL_SOURCE U10 0898_ APR KEURO ... -26.03800000000047 -26.03800000000047
6 F1_U FY22 VD Client BR_IFRS GLOBAL_SOURCE U10 0898_ MAY KEURO ... 324.9980000000014 324.9980000000014
7 F1_U FY22 VD Client BR_IFRS GLOBAL_SOURCE U10 0898_ JUN KEURO ... 1.818989403545856e-12 1.818989403545856e-12
8 F1_U FY22 VD Client BR_IFRS GLOBAL_SOURCE U10 0898_ JUL KEURO ... 0 0
9 F1_U FY22 VD Client BR_IFRS GLOBAL_SOURCE U10 0898_ AUG KEURO ... 0 0
10 rows × 27 columns
I need to use the csv.reader because with pandas option for reading csv's I get a error
ParserError: Error tokenizing data. C error: Expected 11 fields in line 2, saw 17
I am not able to share the csv, since I receive it already with the commas being separated, and everything that I copy here it's transformed in picture
CodePudding user response:
I think this might work :
row_1 = list(df.iloc[0])
df.columns = list(df.columns) row_1
it would be nice to have an example of dataframe so people can try it out!
CodePudding user response:
If you wan to modify the original csv, you can follow something like this
- open file
- modify the first line
- save as new file
import csv
file = 'my_file_new.csv'
file_new = 'my_file_new.csv'
with open(file) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
header = []
for count, row in enumerate(csv_reader):
if count <= 1:
print(row)
header = header row
else:
lines.append(row)
with open(file_new, mode='w') as file:
head = csv.DictWriter(file, delimiter=',', fieldnames=header)
head.writeheader()
writer = csv.writer(file, delimiter=',')
for line in lines:
writer.writerow(line)
CodePudding user response:
using @grymlin idea, I was able to work something out that does what I want:
row_1 = list(df.iloc[0])
row_1_upd = list(filter(None,row_1))
row_2 = list(df.iloc[1])
row_2_upd = list(filter(None,row_2))
new_cols = row_1_upd row_2_upd
len(new_cols)
28
len(df.columns)
27
list_df_col = list(df.columns)
for i in range(len(list_df_col)):
list_df_col[i]=new_cols[i]
df.columns = list_df_col
df = df.iloc[2:]
df.head()
"Scenario" Annee Version Client Destination Source Activite Organisation Periode Currency ... GMA_ RCH RBC TFP TFEX LTSR_ TFS TBDT FSI TCD
2 F1_U FY22 VD Client BR_IFRS GLOBAL_SOURCE U10 0898_ JAN KEURO ... 4.998999999999796 4.998999999999796
3 F1_U FY22 VD Client BR_IFRS GLOBAL_SOURCE U10 0898_ FEB KEURO ... 40.29999999999927 40.29999999999927
4 F1_U FY22 VD Client BR_IFRS GLOBAL_SOURCE U10 0898_ MAR KEURO ... -19.26100000000042 -19.26100000000042
5 F1_U FY22 VD Client BR_IFRS GLOBAL_SOURCE U10 0898_ APR KEURO ... -26.03800000000047 -26.03800000000047
6 F1_U FY22 VD Client BR_IFRS GLOBAL_SOURCE U10 0898_ MAY KEURO ... 324.9980000000014 324.9980000000014