Home > OS >  I cannot change the values of a column with specific condition
I cannot change the values of a column with specific condition

Time:04-24

The table looks like the following:

text dummy1 days
op123ac 1 2000-01-01
op123ac 0 2000-01-04
op123ac 0 2000-01-07
op123ac 0 2000-01-10
op1248ab 0 2000-01-17
op1248ab 1 2000-01-20
op1248ab 1 2000-01-23
op1248ab 1 2000-01-26

Each unique "text" have four repeated values correspond to four unique "days". "days" are consecutive for each "text". The problem is that each "text" must have one unique "dummy1", so it must be 1 or 0 and :

df.groupby("text")['dummy'].sum() # Each "text" should get either 0 or 4, no other value are accept.

I figured the count of error "dummy1" are less than the correct one. For example, "op123ac" has 1 and 0 in the "dummy1" column, count of 0(3) is much more than the count of 1(1) so the correct "dummy1" should be 0. "op1248ab" has more 1s than 0, so it's correct value should be 1. The correct table should look like the following:

text dummy1 days
op123ac 0 2000-01-01
op123ac 0 2000-01-04
op123ac 0 2000-01-07
op123ac 0 2000-01-10
op1248ab 1 2000-01-17
op1248ab 1 2000-01-20
op1248ab 1 2000-01-23
op1248ab 1 2000-01-26

There is no way for number of 0s equal to number of 1s for any "text" value.

CodePudding user response:

 df['dummy1'] = df.groupby('text')['dummy1'].transform(lambda x: x.mode().iat[0])

Output

       text  dummy1        days
0   op123ac       0  2000-01-01
1   op123ac       0  2000-01-04
2   op123ac       0  2000-01-07
3   op123ac       0  2000-01-10
4  op1248ab       1  2000-01-17
5  op1248ab       1  2000-01-20
6  op1248ab       1  2000-01-23
7  op1248ab       1  2000-01-26

The mode of a set of values is the value that appears most often.

  • Related