I have a data frame like this
df1 = pd.DataFrame({'ID' : ['T1002, T5006, T5007, Stay home'] })
ID
0 T1002, T5006, T5007, Stay home
1 Go for walk, T5007, T5007, Stay home
I want to take the two first words from each row and cast them to a new column
Expected outcome:
New_id ID
0 T1002_T5006, Go for walk, T5007, T5007, Stay home
1 Go for walk_T5007, Go for walk, T5007, T5007, Stay home
I tried this but it did not work:
df1['New_id']= df1["ID"].str.split(',').str.join(sep=" ")
Any ideas?
CodePudding user response:
Considering that the dataframe df
looks like this
df = pd.DataFrame({'ID': ['T1002, T5006, T5007, Stay home', 'Go for walk, T5007, T5007, Stay home']})
[Out]:
ID
0 T1002, T5006, T5007, Stay home
1 Go for walk, T5007, T5007, Stay home
Then the following will do the work
df['New_id'] = df['ID'].str.split(',').str[:2].str.join('_')
[Out]:
ID New_id
0 T1002, T5006, T5007, Stay home T1002_ T5006
1 Go for walk, T5007, T5007, Stay home Go for walk_ T5007
Notes:
df['ID']
selects the column ID from the dataframedf
.str.split(',')
splits the string by the commastr[:2]
takes the first two words.str.join('_')
joins the strings with an underscore between them. One could leave it as follows.str.join('')
and, with that, the output would beID New_id 0 T1002, T5006, T5007, Stay home T1002 T5006 1 Go for walk, T5007, T5007, Stay home Go for walk T5007
CodePudding user response:
try:
df["New_id"] = df['ID'].map(lambda x: '_'.join([i.strip() for i in x.split(',')[:2]]))
ID New_id
0 T1002, T5006, T5007, Stay home T1002_T5006
1 Go for walk, T5007, T5007, Stay home Go for walk_T5007