Given a DataFrame df
:
id items
0 1 CRB Spot Index: Grease: First 2 Months: YoY
1 2 CRB Spot Index: Textiles
2 3 CRB Spot Index: Livestock
3 4 Korea: Export Value: First 10 Days: YoY
4 5 Korea: Export Value: First 20 Days: YoY
5 6 Korea: Import Value: First 20 Days: YoY
6 7 Korea: Import Value: First 10 Days: YoY
7 8 Vietnam: Import Value: First 2 Weeks: MoM
8 9 Vietnam: Import Value: First 3 Weeks: MoM
I want to filter items which match patterns: First integer Days
or First integer Weeks
, then create a new column to indicate them:
id items indicator
0 1 CRB Spot Index: Grease: First 2 Months: YoY NaN
1 2 CRB Spot Index: Textiles NaN
2 3 CRB Spot Index: Livestock NaN
3 4 Korea: Export Value: First 10 Days: YoY Y
4 5 Korea: Export Value: First 20 Days: YoY Y
5 6 Korea: Import Value: First 20 Days: YoY Y
6 7 Korea: Import Value: First 10 Days: YoY Y
7 8 Vietnam: Import Value: First 2 Weeks: MoM Y
8 9 Vietnam: Import Value: First 3 Weeks: MoM Y
How could I do that using Pandas?
I use df.loc[(df['items'].str.contains('First|Days', na=False)) & (df['items'].str.contains('First|Weeks', na=False)), 'indicators']='Y'
, it doesn't generate an expected result as shown above, also maybe there are other more concise answers.
CodePudding user response:
Check regex for match First
, then \s
for space(s), then \d
for integers, \s
for again space(s) and last Days|Weeks
for Days
or Weeks
:
df.loc[df['items'].str.contains('First\s \d \s Days|Weeks', na=False), 'indicators']='Y'
print (df)
id items indicators
0 1 CRB Spot Index: Grease: First 2 Months: YoY NaN
1 2 CRB Spot Index: Textiles NaN
2 3 CRB Spot Index: Livestock NaN
3 4 Korea: Export Value: First 10 Days: YoY Y
4 5 Korea: Export Value: First 20 Days: YoY Y
5 6 Korea: Import Value: First 20 Days: YoY Y
6 7 Korea: Import Value: First 10 Days: YoY Y
7 8 Vietnam: Import Value: First 2 Weeks: MoM Y
8 9 Vietnam: Import Value: First 3 Weeks: MoM Y