I need help with combining language
columns into one row, and then drop duplicate columns, just combine two different language
of the same Movie
, year
, and Id
.
There are more similar columns in the CSV, so please help me figure out a way to combine those.Here is the existing csv:
f = pd.DataFrame({'Movie': ['name1','name1','name2','name3','name4','name4'],
'Year': ['1905', '1905','1906','1907','1910','1910'],
'Id': ['tt0283985', 'tt0283985','tt0284043','tt3402904','tt3458360','tt3458360'],
'language':['Mandarian','Cantonese','Mandarian','unknown','Cantonese','Mandarian']})
Where f
now looks like:
Movie Year Id language
0 name1 1905 tt0283985 Mandarian
1 name1 1905 tt0283985 Cantonese
2 name2 1906 tt0284043 Mandarian
3 name3 1907 tt3402904 unknown
4 name4 1910 tt3458360 Cantonese
5 name4 1910 tt3458360 Mandarian
And the result should be like this:
Movie Year Id language
0 name1 1905 tt0283985 Mandarian,Cantonese
1 name2 1906 tt0284043 Mandarian
2 name3 1907 tt3402904 unknown
3 name4 1910 tt3458360 Cantonese,Mandarian
So row 1 and 2 are identical except language, we just need to combine them into one row, so does row 5 and 6. Here's my try:
ff = new_f.groupby(by = ['Movie','Year','Id']).agg(','.join)
ff.to_csv("File.csv", index=False)
But the output is weird:
All other columns disappeared and language
is the only column left:
language
Mandarian,Cantonese
Mandarian
unknown
Cantonese,Mandarian
...
CodePudding user response:
By default, groupby
sets the grouping keys as index, and you explicitly asked to_csv
not to export the index,
Use as_index=False
in groupby
:
ff = f.groupby(by=['Movie','Year','Id'], as_index=False).agg(','.join)
ff.to_csv("File.csv", index=False)
Or, export the index in to_csv
:
ff = f.groupby(by=['Movie','Year','Id']).agg(','.join)
ff.to_csv("File.csv")
NB. if the intermediate is not useful to you, you do not need to set the ff
variable, you can directly chain f.groupby(...).agg(...).to_csv(...)
CodePudding user response:
Use the following command: ff = f.groupby(['Movie','Year','Id']).agg(','.join).reset_index()
It should work.