I got a csv file with two headers, and I don't know how to express it either, I pasted it and this is what it looks like, I need to reorder it to be a normal csv file,No information in "age" key,I just want to retrieve "name" and "age",I need to output "first_name","last_name","age". And use "first_name","last_name","age" as the title,
"ID","meta_key","meta_data"
1,"nickname","dale ganger"
2,"first_name","ganger"
3,"last_name","dale"
4,"age",
5,"sex","F"
6,"nickname","dale ganger"
7,"first_name","ganger"
8,"last_name","dale"
9,"age",
10,"sex","F"
11,"nickname","dale ganger"
12,"first_name","ganger"
13,"last_name","dale"
14,"age",
15,"sex","F"
I used this code, but it doesn't merge the headers,
import pandas as pd
pd.read_csv('input.csv', header=None).T.to_csv('output.csv', header=False, index=False)
output
ID,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
meta_key,nickname,first_name,last_name,age,sex,nickname,first_name,last_name,age,sex,nickname,first_name,last_name,age,sex
meta_data,dale ganger,ganger,dale,,F,dale ganger,ganger,dale,,F,dale ganger,ganger,dale,,F
The final look I want
nickname,first_name,last_name,age,sex
dale ganger,ganger,dale,,F
dale ganger,ganger,dale,,F
dale ganger,ganger,dale,,F
CodePudding user response:
try:
df_csv = pd.DataFrame('data.csv')
df = df_csv.drop('ID', axis=1).transpose()
df.columns = df.iloc[0]
df = df.iloc[1: , :].reset_index(drop=True)
df = df[['first_name', 'last_name', 'age']]
to replicate everything:
import pandas as pd
data = {'ID': [1, 2, 3, 4, 5],
'meta_key': ['nickname', 'first_name', 'last_name', 'age', 'sex'],
'meta_data': ['dale ganger', 'ganger', 'dale', '', 'F']}
df_csv = pd.DataFrame(data)
print(df_csv) # before change
df = df_csv.drop('ID', axis=1).transpose()
df.columns = df.iloc[0] # or use below
# df.columns = ['nickname', 'first_name', 'last_name', 'age', 'sex']
df = df.iloc[1: , :].reset_index(drop=True)
df = df[['first_name', 'last_name', 'age']]
print(df) # after change
output is:
first_name | last_name | age | |
---|---|---|---|
0 | ganger | dale |
I see you changed your question, now the data is repeating every 5 rows. Then I would do this below:
df = pd.read_csv('unstructured.csv')
# create a dictionary to store the data each iteration
data_dict = {'nickname': [], 'first_name': [], 'last_name': [], 'age': [], 'sex': []}
for i in range(0, len(df), 5):
data_dict['nickname'].append(df['meta_data'][i])
data_dict['first_name'].append(df['meta_data'][i 1])
data_dict['last_name'].append(df['meta_data'][i 2])
data_dict['age'].append(df['meta_data'][i 3])
data_dict['sex'].append(df['meta_data'][i 4])
new_df = pd.DataFrame(data_dict)
print(new_df)