I have a pandas dataframe, with a column containing item numbers that are supposed to increase by 1, each row.
df1 = pd.DataFrame({
"item_number" : [1, 2, 3, 4, 5, 6, 8, 10],
"col_A" : ['aaa','bbb','ccc','ddd','eee','fff','hhh', 'jjj']})
df1
item_number col_A
0 1 aaa
1 2 bbb
2 3 ccc
3 4 ddd
4 5 eee
5 6 fff
6 8 hhh
7 10 jjj
As you can see, the item number increases by two between 6 and 8 and 8 and 10. Is there a way to write a function that will a list of the skipped numbers ie. ['7','9'] otherwise, return True
CodePudding user response:
s=pd.Series(range(df['item_number'].min(), (df['item_number'].max() 1)))
s[~s.isin(df['item_number'])].values
array([7, 9], dtype=int64)
CodePudding user response:
one-liner:
set(range(df1.item_number.min(), df1.item_number.max() 1)) - set(df1.item_number) or True
CodePudding user response:
You can take advantage of Python sets and lists operations to find out if the condition you are proposing meets on the input list:
li = [1, 2, 3, 4, 5, 6, 8, 10]
def fun(l):
a = list(set(list(range(l[0], l[-1] 1))) - set(l))
if a == []:
return True
else:
return a
print(fun(li))
Output:
[9, 7]
Also, you can use return sorted(a)
if you want the list elements to be returned in order.
CodePudding user response:
Use range
with np.setdiff1d
:
In [1518]: import numpy as np
In [1519]: rng = range(df1.item_number.min(), df1.item_number.max() 1)
In [1523]: res = np.setdiff1d(rng, df1.item_number)
In [1524]: res
Out[1524]: array([7, 9])
CodePudding user response:
This will do it:
def foo(df):
x = df.set_index('item_number').reindex(range(df.item_number.min(), df.item_number.max() 1))
x = list(x.index[x.col_A.isna()])
return x if x else True
Examples:
y = foo(df1)
print(y)
y = foo(df1.loc[range(1, 6)])
print(y)
Output:
[7, 9]
True