I have a dataframe with a series of dummy columns. I'm trying to convert the "1"s to the distance from each "1" back to the previous "1" using the index value. This also means that I have to skip the first "1" as it has no distance, due to lack of a previous "1". I have labelled this as "A" in the desired output.
I have asked a similar question before, where I needed the distances to values in a specific column, but I have failed to adapt that to this problem: Distance between values in dataframe
How can I achieve the output of dfTarget?
df = pd.DataFrame({'index': [240, 251, 282, 301, 321, 325, 328, 408], 'e1': ['0','1','0','0','0','0','1','0'], 'e2': ['1','0','1','0','0','1','0','0']})
df.set_index('index', inplace=True)
dfTarget = pd.DataFrame({'index': [240, 251, 282, 301, 321, 325, 328, 408], 'e1': ['0','A','0','0','0','0','77','0'], 'e2': ['A','0','42','0','0','43','0','0']})
dfTarget.set_index('index', inplace=True)
print(df)
print("------")
print(dfTarget)
e1 e2
index
240 0 1
251 1 0
282 0 1
301 0 0
321 0 0
325 0 1
328 1 0
408 0 0
------
e1 e2
index
240 0 A
251 A 0
282 0 42
301 0 0
321 0 0
325 0 43
328 77 0
408 0 0
CodePudding user response:
Using a custom function:
def get_diff(s):
return (
s.index.to_series() # index to series
[s.eq('1')].diff().fillna('A') # keep 1s and get diff
.reindex(s.index, fill_value=0) # reindex with 0s
)
dfTarget = df.apply(get_diff)
output:
e1 e2
index
240 0 A
251 A 0
282 0 42.0
301 0 0
321 0 0
325 0 43.0
328 77.0 0
408 0 0
CodePudding user response:
for col in df.columns:
ix_list = list(df[df[col]==str(1)].index) # get explicit index list
for i in range(1,len(ix_list)): # this skips the first one
previous = ix_list[i-1]
df.loc[ix_list[i]][col] = ix_list[i] - ix_list[i-1]