Home > Enterprise >  Count number of occurences of a column changing value from 0 to 1 and 0 to 2
Count number of occurences of a column changing value from 0 to 1 and 0 to 2

Time:12-07

Suppose I have df1['col1'] and df2['col2'] both columns equal in length. The values in the columns are 0, 1 and 2 only.

How do I find the counts of changes from 0 -> 2 and 1 -> 2 from df1 to df2 if each corresponding row in the df is a transition about an element?

As an example, I need to count the changes in the 2 columns below

    print(df1['orig_label'][0:5])
    Name: predicted_label, dtype: int64
    0    2
    1    2
    2    0
    3    0
    4    1

print(df2['predicted_label'][0:5])
Name: predicted_label, dtype: int64
0    1
1    1
2    0
3    2
4    2

Expected output: 1 and 1

which are counts of 0->2 and 1->2 transition

CodePudding user response:

This is a use case for pandas.crosstab:

pd.crosstab(df1['col1'], df2['col2'])

Output:

col2  0  1  2
col1         
0     1  0  1
1     0  0  1
2     0  2  0

To get only 0->2 and 1->2:

pd.crosstab(df1['col1'], df2['col2']).loc[[0,1], [2]]

Output:

col2  2
col1   
0     1
1     1

CodePudding user response:

How about this?:

df1 = pd.DataFrame(data={"col1":[0,1,2,3,4], "col2":[2,2,0,0,1]}, columns=["col1", "col2"])

df["0->2"] = df.apply(lambda row: 1 if row["col1"] == 0 and row["col2"] == 2 else 0, axis=1)
df["1->2"] = df.apply(lambda row: 1 if row["col1"] == 1 and row["col2"] == 2 else 0, axis=1)

print("N 0->2 = {}".format(df["0->2"].sum()))
print("N 1->2 = {}".format(df["1->2"].sum()))

This has the potential downside of adding two additional columns to your original dataframe, but you could also just create them as separate series objects if you don't want to do that:

df = pd.DataFrame(data={"col1":[0,1,2,3,4], "col2":[2,2,0,0,1]}, columns=["col1", "col2"])
zeroToTwo = df.apply(lambda row: 1 if row["col1"] == 0 and row["col2"] == 2 else 0, axis=1)
oneToTwo = df.apply(lambda row: 1 if row["col1"] == 1 and row["col2"] == 2 else 0, axis=1)
print("N 0->2 = {}".format(zeroToTwo.sum()))
print("N 1->2 = {}".format(oneToTwo.sum()))

CodePudding user response:

Here is a way using zip() and value_counts()

(pd.Series(['0-2' if (col1 == 0 and col12 == 2) else '1-2' if (col1 == 1 and col2 == 2) else None for col1,col2 in zip(df['col1'],df['col2'])])
 .value_counts())

Output

0-2    1
1-2    1
dtype: int64
  • Related