I have a csv file, which for ex. kinda looks. like this:
duration | concentration | measurement |
---|---|---|
1.2 | 0 | 10 |
1.25 | 0 | 12 |
... | ... | ... |
10.3 | 0 | 11 |
10.5 | 10 | 100 |
10.6 | 20 | 150 |
10.67 | 30 | 156 |
10.75 | 0 | 12.5 |
11 | 0 | 12 |
... | ... | ... |
I filtered all the rows with the concentration 0 with the following code.
dF2 = dF1[dF1["concentration"]>10][["duration","measurement","concentration"]]
But I would like to have 100(or n specific) extra rows, with the concentration hold on 0, before the rows with concentrations bigger than 10 begins, that I can have a baseline when plotting the data.
Does anybody had experience with a similar problem/ could somebody help me please...
CodePudding user response:
You can use boolean masks for boolean indexing:
# number of baseline rows to keep
n = 2
# cols to keep
cols = ['duration', 'measurement', 'concentration']
# is the concentration greater than 10?
m1 = dF1['concentration'].gt(10)
# is the row one of the n initial concentration 0?
m2 = dF1['concentration'].eq(0).cumsum().le(n)
# if you have values in between 0 and 10 and do not want those
# m2 = (m2:=dF1['concentration'].eq(0)) & m2.cumsum().le(n)
# or
# m2 = df.index.isin(dF1[dF1['concentration'].eq(0)].head(n).index)
# keep rows where either condition is met
dF2 = dF1.loc[m1|m2, cols]
If you only want to keep initial rows before the first value above threshold, change m2
to:
# keep up to n initial rows with concentration=0
# only until the first row above threshold is met
m2 = dF1['concentration'].eq(0).cumsum().le(n) & ~m1.cummax()
output:
duration measurement concentration
0 1.20 10.0 0
1 1.25 12.0 0
4 10.60 150.0 20
5 10.67 156.0 30
CodePudding user response:
You can filter the records and concat to have desired results
n = 100 # No of initial rows with concentratin 0 required
dF2 = pd.concat([dF1[dF1["concentration"]==0].head(n),dF1[dF1["concentration"]>10]])[["duration","measurement","concentration"]]
CodePudding user response:
You can simply filter the data frame for when the concentration is zero, and select the top 100 or top n rows from your filtered data frame using the 'head' and append that to your dF2.
n = 100 # you can change this to include the number of rows you want.
df_baseline = dF1[dF1["concentration"] == 0][["duration","measurement","concentration"]].head(n)
dF2 = dF1[dF1["concentration"]>10][["duration","measurement","concentration"]]
df_final = df_baseline.append(df2)