I dont have experience with dataframes and i stuck in the following problem:
There is a table looking like that:
parent account account number account name code
0 parent 1 123122 account1 1
1 parent 1 456222 account2 1
2 parent 1 456334 account3 1
3 parent 2 456446 account4 1
4 parent 2 456558 account5 2
5 parent 2 456670 account6 3
6 parent 2 456782 account7 1
7 parent 2 456894 account8 1
8 parent 2 457006 account9 1
9 parent 2 457118 account10 1
10 parent 2 457230 account11 1
11 parent 2 457342 account12 1
12 parent 2 457454 account13 1
13 parent 2 457566 account14 1
14 parent 3 457678 account15 1
15 parent 3 457790 account16 1
16 parent 4 457902 account17 5
17 parent 4 458014 account18 5
18 parent 4 458126 account19 5
19 parent 4 458238 account20 5
20 parent 4 458350 account21 1
I need to check which parents have only one version of code(last column) and which have more
the needed output is table looking like the sample but every parent with only one version of code is not included
> import pandas as pd
>
> read by default 1st sheet of an excel file
> dataframe1 = pd.read_excel("./input/dane.xlsx")
> parents = dataframe1.groupby(["parent account", "code"])
This is the only output I've got on that moment, its something but this is not the result i need
> for i in parents["parent account"]:
> print(list(i)[0])
> ```
> ('parent 1', 1)
> ('parent 2', 1)
> ('parent 2', 2)
> ('parent 2', 3)
> ('parent 3', 1)
> ('parent 4', 1)
> ('parent 4', 5)
Could you please help me with that?
CodePudding user response:
First obtain a list of parent accounts such that they have more than 1 distinct code
condition = df.groupby('parent account').code.nunique() > 1
parent_list = list( condition.index[condition.values] )
Then apply the filter on your data
df[ df['parent acount'].isin(parent_list) ]