Home > database >  Retype csv file in python
Retype csv file in python

Time:07-29

I got a csv file with two headers, and I don't know how to express it either, I pasted it and this is what it looks like, I need to reorder it to be a normal csv file,No information in "age" key,I just want to retrieve "name" and "age",I need to output "first_name","last_name","age". And use "first_name","last_name","age" as the title,

"ID","meta_key","meta_data"
1,"nickname","dale ganger"
2,"first_name","ganger"
3,"last_name","dale"
4,"age",
5,"sex","F"
6,"nickname","dale ganger"
7,"first_name","ganger"
8,"last_name","dale"
9,"age",
10,"sex","F"
11,"nickname","dale ganger"
12,"first_name","ganger"
13,"last_name","dale"
14,"age",
15,"sex","F"

I used this code, but it doesn't merge the headers,

import pandas as pd
pd.read_csv('input.csv', header=None).T.to_csv('output.csv', header=False, index=False)

output

ID,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
meta_key,nickname,first_name,last_name,age,sex,nickname,first_name,last_name,age,sex,nickname,first_name,last_name,age,sex
meta_data,dale ganger,ganger,dale,,F,dale ganger,ganger,dale,,F,dale ganger,ganger,dale,,F

The final look I want

nickname,first_name,last_name,age,sex
dale ganger,ganger,dale,,F
dale ganger,ganger,dale,,F
dale ganger,ganger,dale,,F

CodePudding user response:

try:

df_csv = pd.DataFrame('data.csv')

df = df_csv.drop('ID', axis=1).transpose()
df.columns = df.iloc[0]
df = df.iloc[1: , :].reset_index(drop=True)
df = df[['first_name', 'last_name', 'age']]

to replicate everything:

import pandas as pd

data = {'ID': [1, 2, 3, 4, 5],
        'meta_key': ['nickname', 'first_name', 'last_name', 'age', 'sex'],
        'meta_data': ['dale ganger', 'ganger', 'dale', '', 'F']}
df_csv = pd.DataFrame(data)

print(df_csv) # before change

df = df_csv.drop('ID', axis=1).transpose()
df.columns = df.iloc[0] # or use below
# df.columns = ['nickname', 'first_name', 'last_name', 'age', 'sex']
df = df.iloc[1: , :].reset_index(drop=True)
df = df[['first_name', 'last_name', 'age']]
print(df) # after change

output is:

first_name last_name age
0 ganger dale

I see you changed your question, now the data is repeating every 5 rows. Then I would do this below:

df = pd.read_csv('unstructured.csv')

# create a dictionary to store the data each iteration
data_dict = {'nickname': [], 'first_name': [], 'last_name': [], 'age': [], 'sex': []}

for i in range(0, len(df), 5):
    data_dict['nickname'].append(df['meta_data'][i])
    data_dict['first_name'].append(df['meta_data'][i 1])
    data_dict['last_name'].append(df['meta_data'][i 2])
    data_dict['age'].append(df['meta_data'][i 3])
    data_dict['sex'].append(df['meta_data'][i 4])

new_df = pd.DataFrame(data_dict)

print(new_df)
  • Related