Home > Enterprise >  Pandas: calculating ratio between values of dataset for some subset
Pandas: calculating ratio between values of dataset for some subset

Time:05-08

I have a dataset which looks like

value
34
45
3
-3

I want to calculate ratio of values for this dataset, i.e. ratio of value to next value: 34/45 , 45/3, 3/-3

I can do it via myDataset["value"]/myDataset["value"].shift(-1)

Next step is more complex and that is where I an struggling. I need to calculate same ratio, but for selected set of values. The criteria of selection is that value should be greater than previous one. I.e. this time resulting dataset should contains 45/3 only. I started with

myDataset.loc[(myDataset["value"] > myDataset["value"].shift(1)),"value] /  myDataset.loc[(myDataset["value"] > myDataset["value"].shift(1)),"value].shift(-1)

But its not what I want because myDataset.loc changes the dataset itself, so next value found by this is not really next, but next which fits condition in (). While I need really next value from original dataset.

How can I do it?

CodePudding user response:

We could use the condition that if the result is larger than 1

value = df["value"]
result = value.shift() / value
print(result[result>1])
2    15.0

Or, If you want to keep the result of 3/-3, use abs function.

value = df["value"]
result = value.shift() / value
print(result[result.abs()>=1])
2    15.0
3    -1.0

CodePudding user response:

You can do this with a list comprehension (this will return only ratios that meet the condition):

[a/b for a, b in zip(df["value"], df["value"].shift(-1)) if a>b]

If you want to create a new column in the dataframe, you can change it slightly:

df["ratio"] = [a/b if a>b else np.nan for a, b in zip(df["value"], df["value"].shift(-1))]

This will also give a full-length list, rather than only those meeting the condition that the first option returns.

CodePudding user response:

Just make a new shifted column, and work with that:

df['shifted'] = df.value.shift(-1)

print(df.value/df.shifted)
print()
print(df.apply(lambda x: x.value/x.shifted if x.value > x.shifted else None, axis=1))

Output:

0     0.755556
1    15.000000
2    -1.000000
3          NaN
dtype: float64

0     NaN
1    15.0
2    -1.0
3     NaN
dtype: float64
  • Related