Home > OS >  Create boolean columns in DataFrame pandas for date and year
Create boolean columns in DataFrame pandas for date and year

Time:05-31

Let it be the following Python Panda DataFrame:

code date color counter
TTT 2021-02-01 00:30:11 blue 2
AVC 2022-02-01 10:00:01 red 12
BCH 2022-02-02 22:00:01 red 12
POD 2022-03-01 01:00:03 blue 1
UDY 2022-05-01 09:02:01 red 3
YUD 2022-12-01 09:02:01 green 0

I need to create a column by year and month that includes whether or not that row belongs (bool) to the specific month indicated in the column. An example of the resulting DataFrame is as follows:

code date color counter 2021-02 2022-02 2022-03 2022-05 2022-12
TTT 2021-02-01 00:30:11 blue 2 True False False False False
AVC 2022-02-01 10:00:01 red 12 False True False False False
BCH 2022-02-02 22:00:01 red 12 False True False False False
POD 2022-03-01 01:00:03 blue 1 False False True False False
UDY 2022-05-01 09:02:01 red 3 False False False True False
YUD 2022-12-01 09:02:01 green 0 False False False False True

To solve the problem only the information in the date column is used, but I include the complete DataFrame to avoid incompatibilities.

CodePudding user response:

Use get_dummies with Series.dt.strftime for YYYY-MM format and parameter dtype for booleans:

df = df.join(pd.get_dummies(df['date'].dt.strftime('%Y-%m'), dtype=bool))
print (df)
  code                date  color  counter  2021-02  2022-02  2022-03  \
0  TTT 2021-02-01 00:30:11   blue        2     True    False    False   
1  AVC 2022-02-01 10:00:01    red       12    False     True    False   
2  BCH 2022-02-02 22:00:01    red       12    False     True    False   
3  POD 2022-03-01 01:00:03   blue        1    False    False     True   
4  UDY 2022-05-01 09:02:01    red        3    False    False    False   
5  YUD 2022-12-01 09:02:01  green        0    False    False    False   

   2022-05  2022-12  
0    False    False  
1    False    False  
2    False    False  
3    False    False  
4     True    False  
5    False     True  

CodePudding user response:

You can try df.to_period with str.get_dummies

out = df.join(pd.to_datetime(df['date']).dt.to_period("M")
              .astype(str).str.get_dummies()
              .astype(bool))
print(out)

  code                 date  color  counter  2021-02  2022-02  2022-03  2022-05  2022-12
0  TTT  2021-02-01 00:30:11   blue        2     True    False    False    False    False
1  AVC  2022-02-01 10:00:01    red       12    False     True    False    False    False
2  BCH  2022-02-02 22:00:01    red       12    False     True    False    False    False
3  POD  2022-03-01 01:00:03   blue        1    False    False     True    False    False
4  UDY  2022-05-01 09:02:01    red        3    False    False    False     True    False
5  YUD  2022-12-01 09:02:01  green        0    False    False    False    False     True
  • Related