I have a DataFrame on the following format.
id | value | more columns
1 ''
1 ''
1 ''
1.1 "some text"
1.1 "xxx"
1.2 "xxx"
2 ''
2.1 ''
2.1.1 "xxx"
1 ''
3 ''
3 ''
3.1 "hi"
I want to drop duplicates and keep the first instance/row but ONLY consider the group where value is an empty string and with same id so the following output is created.
id | value | more columns
1 ''
1.1 "some text"
1.1 "xxx"
1.2 "xxx"
2 ''
2.1 ''
2.1.1 "xxx"
3 ''
3.1 "hi"
Is there any smart / easy way to do this?
CodePudding user response:
df = pd.DataFrame({'id': ['1', '1', '1', '1.1', '1.1', '1.2', '2', '2.1', '2.1.1', '1', '3', '3', '3.1'],
'value': ['', '', '', 'some text', 'xxx', 'xxx', '', '', 'xxx', '', '', '', 'hi']})
def fun(x):
if x.value.eq('').all():
return x.drop_duplicates('value')
else:
return x
df = df.groupby('id').apply(fun).reset_index(level=0,drop=True)
print(df)
id value
0 1
3 1.1 some text
4 1.1 xxx
5 1.2 xxx
6 2
7 2.1
8 2.1.1 xxx
10 3
12 3.1 hi