How to find column numbers in increasing order-CodePudding

I have a pandas dataframe, with a column containing item numbers that are supposed to increase by 1, each row.

df1 = pd.DataFrame({ 
"item_number" : [1, 2, 3, 4, 5, 6, 8, 10], 
"col_A" : ['aaa','bbb','ccc','ddd','eee','fff','hhh', 'jjj']})

df1
item_number col_A
0   1   aaa
1   2   bbb
2   3   ccc
3   4   ddd
4   5   eee
5   6   fff
6   8   hhh
7   10  jjj

As you can see, the item number increases by two between 6 and 8 and 8 and 10. Is there a way to write a function that will a list of the skipped numbers ie. ['7','9'] otherwise, return True

CodePudding user response：

s=pd.Series(range(df['item_number'].min(), (df['item_number'].max() 1)))
s[~s.isin(df['item_number'])].values

array([7, 9], dtype=int64)

CodePudding user response：

one-liner:

set(range(df1.item_number.min(), df1.item_number.max() 1)) - set(df1.item_number) or True

CodePudding user response：

You can take advantage of Python sets and lists operations to find out if the condition you are proposing meets on the input list:

li = [1, 2, 3, 4, 5, 6, 8, 10]

def fun(l):
    a = list(set(list(range(l[0], l[-1] 1))) - set(l))
    if a == []:
        return True
    else:
        return a
print(fun(li))

Output:

[9, 7]

Also, you can use return sorted(a) if you want the list elements to be returned in order.

CodePudding user response：

Use range with np.setdiff1d:

In [1518]: import numpy as np

In [1519]: rng = range(df1.item_number.min(), df1.item_number.max()   1)

In [1523]: res = np.setdiff1d(rng, df1.item_number)

In [1524]: res
Out[1524]: array([7, 9])

CodePudding user response：

This will do it:

def foo(df):
    x = df.set_index('item_number').reindex(range(df.item_number.min(), df.item_number.max()   1))
    x = list(x.index[x.col_A.isna()])
    return x if x else True

Examples:

y = foo(df1)
print(y)
y = foo(df1.loc[range(1, 6)])
print(y)

Output:

[7, 9]
True