So I have a Dataframe that looks like this:
ValueA | Value B |
---|---|
a | 1 |
NaN | 2 |
NaN | 3 |
b | 4 |
NaN | 5 |
NaN | 6 |
c | 7 |
NaN | 8 |
NaN | 9 |
I want to create a DataFrame, in which all values between b and c are stored - including b, excluding c.
The expected output is as follows:
ValueA | Value B |
---|---|
b | 4 |
NaN | 5 |
NaN | 6 |
CodePudding user response:
I think this will do what your question asks:
import pandas as pd
import numpy as np
import math
df = pd.DataFrame({
'ValueA' : ['a', np.nan, np.nan, 'b', np.nan, np.nan, 'c', np.nan, np.nan],
'Value B' : list(range(1, 10))})
print(df)
start = df[df['ValueA'] == 'b'].index.tolist()[0]
end = start 1
while end < df.shape[0] and type(df['ValueA'].loc[end]) is not str and math.isnan(df['ValueA'].loc[end]):
end = 1
df2 = df.loc[start:end - 1].reset_index(drop=True, inplace=False)
print(f"\n{df2}")
Output:
ValueA Value B
0 a 1
1 NaN 2
2 NaN 3
3 b 4
4 NaN 5
5 NaN 6
6 c 7
7 NaN 8
8 NaN 9
ValueA Value B
0 b 4
1 NaN 5
2 NaN 6
CodePudding user response:
Use two masks:
# rows after the first b
m1 = df['ValueA'].eq('b').cummax()
# rows after the first c
m2 = df['ValueA'].eq('c').cummax()
# rows between b and c excluded
df[m1&~m2]
Another approach, which however will give the rows between the first b/c (included) and b/c (excluded). This will also work between two b, two c, or if c comes before b!
df[df['ValueA'].isin(('b', 'c')).cumsum().eq(1)]
output:
ValueA ValueB
3 b 4
4 NaN 5
5 NaN 6
CodePudding user response:
If you have the "default" index (composed of consecutive integers, starting from 0), then one of possible solutions is to find indices of the "wanted" rows:
ind1, ind2 = df.index[df['ValueA'].isin(['b', 'c'])].tolist()
Then just retrieve rows from this range (excluding the right border):
result = df[ind1:ind2]
The result is:
ValueA Value B
3 b 4
4 NaN 5
5 NaN 6
This code will work also if both "border" values are not in lexicographic order. Values of Value1 contained possibly between both border rows, which are "< 'b'" or "> 'c'", will not do any harm, which is not true in the case of one of other answers.
Note that df.loc[ind1:ind2]
will get also the row with ValueA == 'c'.