Python pandas. How to include rows before specific conditioned rows?-CodePudding

I have a csv file, which for ex. kinda looks. like this:

duration	concentration	measurement
1.2	0	10
1.25	0	12
...	...	...
10.3	0	11
10.5	10	100
10.6	20	150
10.67	30	156
10.75	0	12.5
11	0	12
...	...	...

I filtered all the rows with the concentration 0 with the following code.

dF2 = dF1[dF1["concentration"]>10][["duration","measurement","concentration"]]

But I would like to have 100(or n specific) extra rows, with the concentration hold on 0, before the rows with concentrations bigger than 10 begins, that I can have a baseline when plotting the data.

Does anybody had experience with a similar problem/ could somebody help me please...

CodePudding user response：

You can use boolean masks for boolean indexing:

# number of baseline rows to keep
n = 2
# cols to keep
cols = ['duration', 'measurement', 'concentration']

# is the concentration greater than 10?
m1 = dF1['concentration'].gt(10)
# is the row one of the n initial concentration 0?
m2 = dF1['concentration'].eq(0).cumsum().le(n)

# if you have values in between 0 and 10 and do not want those
# m2 = (m2:=dF1['concentration'].eq(0)) & m2.cumsum().le(n)
# or
# m2 = df.index.isin(dF1[dF1['concentration'].eq(0)].head(n).index)

# keep rows where either condition is met
dF2 = dF1.loc[m1|m2, cols]

If you only want to keep initial rows before the first value above threshold, change m2 to:

# keep up to n initial rows with concentration=0
# only until the first row above threshold is met
m2 = dF1['concentration'].eq(0).cumsum().le(n) & ~m1.cummax()

output:

   duration  measurement  concentration
0      1.20         10.0              0
1      1.25         12.0              0
4     10.60        150.0             20
5     10.67        156.0             30

CodePudding user response：

You can filter the records and concat to have desired results

n = 100 # No of initial rows with concentratin 0 required

dF2 = pd.concat([dF1[dF1["concentration"]==0].head(n),dF1[dF1["concentration"]>10]])[["duration","measurement","concentration"]]

CodePudding user response：

You can simply filter the data frame for when the concentration is zero, and select the top 100 or top n rows from your filtered data frame using the 'head' and append that to your dF2.

n = 100 # you can change this to include the number of rows you want.
    df_baseline = dF1[dF1["concentration"] == 0][["duration","measurement","concentration"]].head(n)
    
    dF2 = dF1[dF1["concentration"]>10][["duration","measurement","concentration"]]

    df_final = df_baseline.append(df2)