Home > Net >  pandas find start and stop point of non-null values
pandas find start and stop point of non-null values

Time:11-25

I'd like to find the start and stop points of a column and flag them as shown below:

value flag
NaN NaN
NaN NaN
1 start
2 NaN
1 NaN
3 NaN
2 stop
NaN NaN
1 start
2 stop

CodePudding user response:

  • start occurs when the current value is notnull and the previous value isnull
  • stop occurs when the current value is notnull and the next value isnull

Generate these conditions using shift and assign using loc:

start = df.value.notnull() & df.value.shift().isnull()
stop = df.value.notnull() & df.value.shift(-1).isnull()

df.loc[start, 'flag'] = 'start'
df.loc[stop, 'flag'] = 'stop'

#    value   flag
# 0    NaN    NaN
# 1    NaN    NaN
# 2    1.0  start
# 3    2.0    NaN
# 4    1.0    NaN
# 5    3.0    NaN
# 6    2.0   stop
# 7    NaN    NaN
# 8    1.0  start
# 9    2.0   stop

Alternatively assign using mask:

df['flag'] = df['flag'].mask(start, 'start')
df['flag'] = df['flag'].mask(stop, 'stop')

CodePudding user response:

Here I iterated through the rows and used a flag to indicate if we were starting or not.

start_flag = 0
for index, row in df.iterrows():
  if row['val'].isnull():
    df.loc[index, 'flag'] = "NaN"
    start_flag = 0
  else:
    if start_flag == 0:
      df.loc[index, 'flag'] = "start"
      start_flag = 1
    else:
      if (index < df.shape[0]-1 and df.loc[index 1, 'val'].isnull()) or index == df.shape[0]-1:
         df.loc[index, 'flag'] = "stop"

CodePudding user response:

Here's what you need:

# Auxiliar columns to detect start and end
df['Past'] = df['Value'].shift(-1)
df['Future'] = df['Value'].shift(1)

# Auxiliar function to complete new column
def Find_start_stop_Null(row):
    flag = np.nan
    if ((not pd.isnull(row['Value'])) and (pd.isnull(row['Future']))):
        flag = 'start'
    elif ((not pd.isnull(row['Value'])) and (pd.isnull(row['Past']))):
        flag = 'stop'
    return flag

df['flag'] = df.apply(lambda row: Find_start_stop_Null(row), axis=1)
# Drop unnecessary columns
df = df.drop('Past', axis=1)
df = df.drop('Future', axis=1)

CodePudding user response:

df = pd.DataFrame({'value': [np.nan, np.nan, 1, 2, 1, 3, 2, np.nan, 1, 2]}, dtype='object')
df['flag'] = df.value.groupby(df.value.notna().diff().ne(0).cumsum()). \
    apply(lambda se: pd.Series(['start']   [np.nan] * (se.size - 2)   ['stop'], index=se.index) if se.notna().sum() else se)
print(df)
  value   flag
0   NaN    NaN
1   NaN    NaN
2     1  start
3     2    NaN
4     1    NaN
5     3    NaN
6     2   stop
7   NaN    NaN
8     1  start
9     2   stop
  • Related