Home > OS >  Pandas - index distance between values in columns
Pandas - index distance between values in columns

Time:10-13

I'm have a dataframe with a series of dummy columns. I'm trying to convert the "1"s to the distance from each "1" back to the previous "1" using the index value. This also means that I have to skip the first "1" as it has no distance, due to lack of a previous "1". I have labelled this as "A" in the desired output.

I have asked a similar question before, where I needed the distances to values in a specific column, but I have failed to adapt that to this problem: Distance between values in dataframe

How can I achieve the output of dfTarget?

Thank you in advance.

df = pd.DataFrame({'index': [240, 251, 282, 301, 321, 325, 328, 408], 'e1': ['0','1','0','0','0','0','1','0'], 'e2': ['1','0','1','0','0','1','0','0']})
df.set_index('index', inplace=True)
dfTarget = pd.DataFrame({'index': [240, 251, 282, 301, 321, 325, 328, 408], 'e1': ['0','A','0','0','0','0','77','0'], 'e2': ['A','0','42','0','0','43','0','0']})
dfTarget.set_index('index', inplace=True)

print(df)
print("------")
print(dfTarget)

      e1 e2
index      
240    0  1
251    1  0
282    0  1
301    0  0
321    0  0
325    0  1
328    1  0
408    0  0
------
       e1  e2
index        
240     0   A
251     A   0
282     0  42
301     0   0
321     0   0
325     0  43
328    77   0
408     0   0

CodePudding user response:

Using a custom function:

def get_diff(s):
    return (
    s.index.to_series()               # index to series
     [s.eq('1')].diff().fillna('A')   # keep 1s and get diff
     .reindex(s.index, fill_value=0)  # reindex with 0s
    )
    
dfTarget = df.apply(get_diff)

output:

         e1    e2
index            
240       0     A
251       A     0
282       0  42.0
301       0     0
321       0     0
325       0  43.0
328    77.0     0
408       0     0

CodePudding user response:

for col in df.columns:
    ix_list = list(df[df[col]==str(1)].index)  # get explicit index list
    for i in range(1,len(ix_list)): # this skips the first one
        previous = ix_list[i-1]
        df.loc[ix_list[i]][col] = ix_list[i] - ix_list[i-1] 
  • Related