Home > front end >  python dataframe unique values
python dataframe unique values

Time:11-10

I dont have experience with dataframes and i stuck in the following problem:

There is a table looking like that:

enter image description here

parent account  account number  account name    code
0   parent 1    123122  account1    1
1   parent 1    456222  account2    1
2   parent 1    456334  account3    1
3   parent 2    456446  account4    1
4   parent 2    456558  account5    2
5   parent 2    456670  account6    3
6   parent 2    456782  account7    1
7   parent 2    456894  account8    1
8   parent 2    457006  account9    1
9   parent 2    457118  account10   1
10  parent 2    457230  account11   1
11  parent 2    457342  account12   1
12  parent 2    457454  account13   1
13  parent 2    457566  account14   1
14  parent 3    457678  account15   1
15  parent 3    457790  account16   1
16  parent 4    457902  account17   5
17  parent 4    458014  account18   5
18  parent 4    458126  account19   5
19  parent 4    458238  account20   5
20  parent 4    458350  account21   1

I need to check which parents have only one version of code(last column) and which have more

the needed output is table looking like the sample but every parent with only one version of code is not included

> import pandas as pd
> 
> read by default 1st sheet of an excel file
> dataframe1 = pd.read_excel("./input/dane.xlsx")

> parents = dataframe1.groupby(["parent account", "code"])


This is the only output I've got on that moment, its something but this is not the result i need

> for i in parents["parent account"]:
> print(list(i)[0])
> ```
> ('parent 1', 1)
> ('parent 2', 1)
> ('parent 2', 2)
> ('parent 2', 3)
> ('parent 3', 1)
> ('parent 4', 1)
> ('parent 4', 5)

Could you please help me with that?

CodePudding user response:

First obtain a list of parent accounts such that they have more than 1 distinct code

condition = df.groupby('parent account').code.nunique() > 1

parent_list = list( condition.index[condition.values] )

Then apply the filter on your data

df[ df['parent acount'].isin(parent_list) ]
  • Related