I want to create a new column and add values before hyphen in a dataframe when splitting word with hyphen in another column of same dataframe.That is , I want to show only the value before the hyphen in the new column.
code:
data = {'month': ['April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'January', 'February', 'March'],
'kpi': ['D sales', 'D sales', 'D sales', 'D sales', 'D sales', 'D sales', 'D sales', 'D sales', 'D sales', 'D sales', 'K sales', 'K sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales'],
'product': ['ps', 'ss', 'ks', 'ds', 'ls', 'js', 'vs', 'es', 'rs',
'os', 'Export-ps', 'Retail-is', 'Export-water', 'Retail-bottle', 'Export-salt', 'Farming-Fruits', 'Farming-Vegetables', 'Export-Tea',
'Export-water', 'Retail-bottle', 'Retail-water', 'Farming-Vegetables', 'Farming-Vegetables', 'Export-salt'],
'year': [2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2023, 2023, 2023, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2023, 2023, 2023]
}
df = pd.DataFrame(data)
new = df["product"].str.split(" ", n = 1, expand = True)
df["group"]= new[0]
print(df)
Output :
month kpi product year group
0 April D sales ps 2022 ps
1 May D sales ss 2022 ss
2 June D sales ks 2022 ks
3 July D sales ds 2022 ds
4 August D sales ls 2022 ls
5 September D sales js 2022 js
6 October D sales vs 2022 vs
7 November D sales es 2022 es
8 December D sales rs 2022 rs
9 January D sales os 2023 os
10 February K sales Export-ps 2023 Export-ps
11 March K sales Retail-is 2023 Retail-is
12 April S sales Export-water 2022 Export-water
13 May S sales Retail-bottle 2022 Retail-bottle
14 June S sales Export-salt 2022 Export-salt
15 July S sales Farming-Fruits 2022 Farming-Fruits
16 August S sales Farming-Vegetables 2022 Farming-Vegetables
17 September S sales Export-Tea 2022 Export-Tea
18 October S sales Export-water 2022 Export-water
19 November S sales Retail-bottle 2022 Retail-bottle
20 December S sales Retail-water 2022 Retail-water
21 January S sales Farming-Vegetables 2023 Farming-Vegetables
22 February S sales Farming-Vegetables 2023 Farming-Vegetables
23 March S sales Export-salt 2023 Export-salt
Expected output is:
month kpi product year group
0 April D sales ps 2022
1 May D sales ss 2022
2 June D sales ks 2022
3 July D sales ds 2022
4 August D sales ls 2022
5 September D sales js 2022
6 October D sales vs 2022
7 November D sales es 2022
8 December D sales rs 2022
9 January D sales os 2023
10 February K sales Export-ps 2023 Export
11 March K sales Retail-is 2023 Retail
12 April S sales Export-water 2022 Export
13 May S sales Retail-bottle 2022 Retail
14 June S sales Export-salt 2022 Export
15 July S sales Farming-Fruits 2022 Farming
16 August S sales Farming-Vegetables 2022 Farming
17 September S sales Export-Tea 2022 Export
18 October S sales Export-water 2022 Export
19 November S sales Retail-bottle 2022 Retail
20 December S sales Retail-water 2022 Retail
21 January S sales Farming-Vegetables 2023 Farming
22 February S sales Farming-Vegetables 2023 Farming
23 March S sales Export-salt 2023 Export
can anyone suggest a solution to get the expected output?
CodePudding user response:
Use Series.str.extract
:
df["group"] = df["product"].str.extract(r'(.*)-')
# capturing everything before `-`, returning `NaN` if no match
print(df)
month kpi product year group
0 April D sales ps 2022 NaN
1 May D sales ss 2022 NaN
2 June D sales ks 2022 NaN
3 July D sales ds 2022 NaN
4 August D sales ls 2022 NaN
5 September D sales js 2022 NaN
6 October D sales vs 2022 NaN
7 November D sales es 2022 NaN
8 December D sales rs 2022 NaN
9 January D sales os 2023 NaN
10 February K sales Export-ps 2023 Export
11 March K sales Retail-is 2023 Retail
12 April S sales Export-water 2022 Export
13 May S sales Retail-bottle 2022 Retail
14 June S sales Export-salt 2022 Export
15 July S sales Farming-Fruits 2022 Farming
16 August S sales Farming-Vegetables 2022 Farming
17 September S sales Export-Tea 2022 Export
18 October S sales Export-water 2022 Export
19 November S sales Retail-bottle 2022 Retail
20 December S sales Retail-water 2022 Retail
21 January S sales Farming-Vegetables 2023 Farming
22 February S sales Farming-Vegetables 2023 Farming
23 March S sales Export-salt 2023 Export
Chain Series.fillna
if you want to replace the NaN
values (e.g. with an empty string: .fillna('')
).