Home > database >  How to create new column and add value before hyphen in a dataframe when splitting word with hyphen
How to create new column and add value before hyphen in a dataframe when splitting word with hyphen

Time:11-02

I want to create a new column and add values before hyphen in a dataframe when splitting word with hyphen in another column of same dataframe.That is , I want to show only the value before the hyphen in the new column.

code:

data = {'month': ['April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'January', 'February', 'March'],
    'kpi': ['D sales', 'D sales', 'D sales', 'D sales', 'D sales', 'D sales', 'D sales', 'D sales', 'D sales', 'D sales', 'K sales', 'K sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales', 'S sales'],
    'product': ['ps', 'ss', 'ks', 'ds', 'ls', 'js', 'vs', 'es', 'rs',
            'os', 'Export-ps', 'Retail-is', 'Export-water', 'Retail-bottle', 'Export-salt', 'Farming-Fruits', 'Farming-Vegetables', 'Export-Tea',
            'Export-water', 'Retail-bottle', 'Retail-water', 'Farming-Vegetables', 'Farming-Vegetables', 'Export-salt'],
    'year': [2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2023, 2023, 2023, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2023, 2023, 2023]
    }
df = pd.DataFrame(data)
new = df["product"].str.split(" ", n = 1, expand = True)
df["group"]= new[0]
print(df)

Output :

        month      kpi             product  year               group
0       April  D sales                  ps  2022                  ps
1         May  D sales                  ss  2022                  ss
2        June  D sales                  ks  2022                  ks
3        July  D sales                  ds  2022                  ds
4      August  D sales                  ls  2022                  ls
5   September  D sales                  js  2022                  js
6     October  D sales                  vs  2022                  vs
7    November  D sales                  es  2022                  es
8    December  D sales                  rs  2022                  rs
9     January  D sales                  os  2023                  os
10   February  K sales           Export-ps  2023           Export-ps
11      March  K sales           Retail-is  2023           Retail-is
12      April  S sales        Export-water  2022        Export-water
13        May  S sales       Retail-bottle  2022       Retail-bottle
14       June  S sales         Export-salt  2022         Export-salt
15       July  S sales      Farming-Fruits  2022      Farming-Fruits
16     August  S sales  Farming-Vegetables  2022  Farming-Vegetables
17  September  S sales          Export-Tea  2022          Export-Tea
18    October  S sales        Export-water  2022        Export-water
19   November  S sales       Retail-bottle  2022       Retail-bottle
20   December  S sales        Retail-water  2022        Retail-water
21    January  S sales  Farming-Vegetables  2023  Farming-Vegetables
22   February  S sales  Farming-Vegetables  2023  Farming-Vegetables
23      March  S sales         Export-salt  2023         Export-salt

Expected output is:

    month      kpi             product  year           group
0       April  D sales                  ps  2022                  
1         May  D sales                  ss  2022                  
2        June  D sales                  ks  2022                  
3        July  D sales                  ds  2022                  
4      August  D sales                  ls  2022                  
5   September  D sales                  js  2022                  
6     October  D sales                  vs  2022                  
7    November  D sales                  es  2022                  
8    December  D sales                  rs  2022                 
9     January  D sales                  os  2023                  
10   February  K sales           Export-ps  2023       Export           
11      March  K sales           Retail-is  2023       Retail        
12      April  S sales        Export-water  2022       Export
13        May  S sales       Retail-bottle  2022       Retail
14       June  S sales         Export-salt  2022       Export
15       July  S sales      Farming-Fruits  2022       Farming
16     August  S sales  Farming-Vegetables  2022       Farming
17  September  S sales          Export-Tea  2022       Export
18    October  S sales        Export-water  2022       Export
19   November  S sales       Retail-bottle  2022       Retail
20   December  S sales        Retail-water  2022       Retail
21    January  S sales  Farming-Vegetables  2023       Farming
22   February  S sales  Farming-Vegetables  2023       Farming
23      March  S sales         Export-salt  2023       Export

can anyone suggest a solution to get the expected output?

CodePudding user response:

Use Series.str.extract:

df["group"] = df["product"].str.extract(r'(.*)-')
# capturing everything before `-`, returning `NaN` if no match

print(df)

        month      kpi             product  year    group
0       April  D sales                  ps  2022      NaN
1         May  D sales                  ss  2022      NaN
2        June  D sales                  ks  2022      NaN
3        July  D sales                  ds  2022      NaN
4      August  D sales                  ls  2022      NaN
5   September  D sales                  js  2022      NaN
6     October  D sales                  vs  2022      NaN
7    November  D sales                  es  2022      NaN
8    December  D sales                  rs  2022      NaN
9     January  D sales                  os  2023      NaN
10   February  K sales           Export-ps  2023   Export
11      March  K sales           Retail-is  2023   Retail
12      April  S sales        Export-water  2022   Export
13        May  S sales       Retail-bottle  2022   Retail
14       June  S sales         Export-salt  2022   Export
15       July  S sales      Farming-Fruits  2022  Farming
16     August  S sales  Farming-Vegetables  2022  Farming
17  September  S sales          Export-Tea  2022   Export
18    October  S sales        Export-water  2022   Export
19   November  S sales       Retail-bottle  2022   Retail
20   December  S sales        Retail-water  2022   Retail
21    January  S sales  Farming-Vegetables  2023  Farming
22   February  S sales  Farming-Vegetables  2023  Farming
23      March  S sales         Export-salt  2023   Export

Chain Series.fillna if you want to replace the NaN values (e.g. with an empty string: .fillna('')).

  • Related