Home > Enterprise >  create a new data frame from existing data frame based on condition
create a new data frame from existing data frame based on condition

Time:12-15

I have a data frame df

import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[0,1,1,0,1,0], [1,0,1,1,0,0], [1,1,0,0,0,1],[1,0,1,0,1,1], 
[0,0,1,0,0,1]]))
df

Now, from data frame df I like to create a new data frame based on condition Condition: if a column contain three or more than three '1' then the new data frame column value is '1' otherwise '0'

expected output of new data frame
    1 0 1 0 0 1

CodePudding user response:

You can also get it without apply. You could sum along the rows, axis=0, and creating a boolean with gt(2):

res = df.sum(axis=0).gt(2).astype(int)

print(res)

0    1
1    0
2    1
3    0
4    0
5    1
dtype: int32

As David pointed out, the result of the above is a series. If you require a dataframe, you can chain to_frame() at the end of it

CodePudding user response:

You could do the following:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[0,1,1,0,1,0], [1,0,1,1,0,0], [1,1,0,0,0,1],[1,0,1,0,1,1], 
[0,0,1,0,0,1]]))
df_res = pd.DataFrame(df.apply(lambda c: 1 if np.sum(c) > 2 else 0))

In [6]: df_res
Out[6]: 
   0
0  1
1  0
2  1
3  0
4  0
5  1

Instead of np.sum(c) you can also do c.sum()

And if you want it transposed just do the following instead:

df_res = pd.DataFrame(df.apply(lambda c: 1 if c.sum() > 2 else 0)).T
  • Related