I have a data frame df
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[0,1,1,0,1,0], [1,0,1,1,0,0], [1,1,0,0,0,1],[1,0,1,0,1,1],
[0,0,1,0,0,1]]))
df
Now, from data frame df I like to create a new data frame based on condition Condition: if a column contain three or more than three '1' then the new data frame column value is '1' otherwise '0'
expected output of new data frame
1 0 1 0 0 1
CodePudding user response:
You can also get it without apply
. You could sum
along the rows, axis=0
, and creating a boolean with gt(2)
:
res = df.sum(axis=0).gt(2).astype(int)
print(res)
0 1
1 0
2 1
3 0
4 0
5 1
dtype: int32
As David pointed out, the result of the above is a series
. If you require a dataframe, you can chain to_frame()
at the end of it
CodePudding user response:
You could do the following:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[0,1,1,0,1,0], [1,0,1,1,0,0], [1,1,0,0,0,1],[1,0,1,0,1,1],
[0,0,1,0,0,1]]))
df_res = pd.DataFrame(df.apply(lambda c: 1 if np.sum(c) > 2 else 0))
In [6]: df_res
Out[6]:
0
0 1
1 0
2 1
3 0
4 0
5 1
Instead of np.sum(c)
you can also do c.sum()
And if you want it transposed just do the following instead:
df_res = pd.DataFrame(df.apply(lambda c: 1 if c.sum() > 2 else 0)).T