Hey I have a DataFrame like this:
d = {"KEY": ["KEY2", "KEY2"], "String value": ["value 1", "value 2"], "list value": [["val1"], ["val2"]]}
df = pd.DataFrame(d)
df
In KEY
column there is the same value in both rows. What I want to do is to create one row from this two rows (or more in my DataFrame) in such a way that values from from a given columns (except KEY
and possibly one more column) are added. So finally I want sth like this:
d2 = {"KEY": ["KEY2"], "String value": ["value 1" "value 2"], "list value": [["val1"] ["val2"]]}
res = pd.DataFrame(d2)
res
How can I do that?
CodePudding user response:
groupby
and sum
it is possible to process columns separately and leave some columns "unchanged" by keeping the first value, assuming it is the same for rows with the same key.
d = {"KEY": ["KEY2", "KEY2"], "String value": ["value 1", "value 2"], "list value": [["val1"], ["val2"]],
"other value": [1, 1]}
df = pd.DataFrame(d)
df2 = pd.DataFrame(df.groupby('KEY')['String value'].apply(lambda x:'\n'.join(x)))
df2['list value'] = df.groupby('KEY')['list value'].sum()
df2['other value'] = df.groupby('KEY')['other value'].first()
df2
KEY String value list value
0 KEY2 value 1value 2 [val1, val2]