Suppose I have df1['col1']
and df2['col2']
both columns equal in length.
The values in the columns are 0
, 1
and 2
only.
How do I find the counts of changes from 0
-> 2
and 1
-> 2
from df1
to df2
if each corresponding row in the df
is a transition about an element?
As an example, I need to count the changes in the 2 columns below
print(df1['orig_label'][0:5])
Name: predicted_label, dtype: int64
0 2
1 2
2 0
3 0
4 1
print(df2['predicted_label'][0:5])
Name: predicted_label, dtype: int64
0 1
1 1
2 0
3 2
4 2
Expected output: 1 and 1
which are counts of 0->2 and 1->2 transition
CodePudding user response:
This is a use case for pandas.crosstab
:
pd.crosstab(df1['col1'], df2['col2'])
Output:
col2 0 1 2
col1
0 1 0 1
1 0 0 1
2 0 2 0
To get only 0->2 and 1->2:
pd.crosstab(df1['col1'], df2['col2']).loc[[0,1], [2]]
Output:
col2 2
col1
0 1
1 1
CodePudding user response:
How about this?:
df1 = pd.DataFrame(data={"col1":[0,1,2,3,4], "col2":[2,2,0,0,1]}, columns=["col1", "col2"])
df["0->2"] = df.apply(lambda row: 1 if row["col1"] == 0 and row["col2"] == 2 else 0, axis=1)
df["1->2"] = df.apply(lambda row: 1 if row["col1"] == 1 and row["col2"] == 2 else 0, axis=1)
print("N 0->2 = {}".format(df["0->2"].sum()))
print("N 1->2 = {}".format(df["1->2"].sum()))
This has the potential downside of adding two additional columns to your original dataframe, but you could also just create them as separate series objects if you don't want to do that:
df = pd.DataFrame(data={"col1":[0,1,2,3,4], "col2":[2,2,0,0,1]}, columns=["col1", "col2"])
zeroToTwo = df.apply(lambda row: 1 if row["col1"] == 0 and row["col2"] == 2 else 0, axis=1)
oneToTwo = df.apply(lambda row: 1 if row["col1"] == 1 and row["col2"] == 2 else 0, axis=1)
print("N 0->2 = {}".format(zeroToTwo.sum()))
print("N 1->2 = {}".format(oneToTwo.sum()))
CodePudding user response:
Here is a way using zip()
and value_counts()
(pd.Series(['0-2' if (col1 == 0 and col12 == 2) else '1-2' if (col1 == 1 and col2 == 2) else None for col1,col2 in zip(df['col1'],df['col2'])])
.value_counts())
Output
0-2 1
1-2 1
dtype: int64