I have csv file with this structure:
code1 code2 code3 name1 name2 sometnig1 something2
14355 12345 54133 part1 part12 aaaaaaaa bbbbbbb
54782 57815 52781 part2 part22 ccccccc ffffffff
14515 52495 52852 part3 part33 ddddddd sssssss
I need to parse this csv file and create my new csv file with my own headers and only columns, that I need, for example:
code_1 code_2 name_1 name_2 something_2
14355 12345 part1 part12 bbbbbbb
54782 57815 part2 part22 ffffffff
14515 52495 part3 part33 sssssss
I know, that I can select one column that I need and write it to another file using pandas:
df = pd.read_csv(file)
df1 = df[code_1]
But how can I select multiple columns and write in one file?
CodePudding user response:
You can select multiple columns by using a list:
df1 = df[['code1', 'code2', 'name1', 'name2', 'something2']]
You can then change the column names using another list:
df1.columns = ['code_1', 'code_2', 'name_1', 'name_2', 'something_2']
then you can write that back to a csv
df1.to_csv('new filname.csv')
CodePudding user response:
The easiest would be to read only the columns you care about, and save some memory too:
df = pd.read_csv(file, usecols=["code_1", "code_2", "name_1", "name_2", "something_2"])
df.to_csv("other_file.csv", index=False)
Another option, if you already have a df
you want to subset, is to use a list to select the columns you care about.
df = df[["code_1", "code_2", "name_1", "name_2", "something_2"]]