I have a subset of a much larger dataframe:
import pandas as pd
data = {'Name': ['car- at', 'car- ar', 'truck- br', 'car- at'],
'SysNum': [1, 2, 3, 4 ],
}
Name SysNum
0 car- at 1
1 car- ar 2
2 truck- br 3
3 car- at 4
I need to be able to enter the numbers into the name after the - . The dash is in each name so I tried to use a couple Regex lines but I am unable to get it so far. The dataframe should look like:
Name SysNum Count
0 car- at 1 car-1 at
1 car- ar 2 car-2 ar
2 truck- br 3 truck-3 br
3 car- at 4 car-4 at
I have used something similar before:
df['BaseName'] = [re.sub('(?<=-)\d{1,6}', '', s) for s in df['Name'].values]
df['SysNum'] = [int(re.search('(?<=-)\d{1,6}', s)[0]) for s in df['Name'].values]
df['NewName'] = df['BaseName'] df['SysNum'].astype(str)
This takes away the numbers after the - and then I add a new set of numbers back onto the end with the last line. I now have a set of letters at the end so I need to be able to put in a new set of numbers after the - and keep the letters on at the end. A new solution to this or a way to use regex that looks at the - and stop at the next set of letters should work. Thanks.
CodePudding user response:
You can use
df['Count'] = df.apply(lambda x: x['Name'].replace('- ', f"-{x['SysNum']} "), axis=1)
# => df['Count']
# 0 car-1 at
# 1 car-2 ar
# 2 truck-3 br
# 3 car-4 at
# Name: Count, dtype: object
This is just replacing -
space with -
SysNum column value space.
If you want to make sure you insert the values after the first word that has a -
after it, you may use a regex:
df['Count'] = df.apply(lambda x: re.sub(r'^\w -', fr"\g<0>{x['SysNum']}", x["Name"]), axis=1)
# => df
# Name SysNum Count
# 0 car- at 1 car-1 at
# 1 car- ar 2 car-2 ar
# 2 truck- br 3 truck-3 br
# 3 car- at 4 car-4 at
Here, ^\w -
matches start of string (^
), \w
matches one or more letters/digits/_
, and then -
is matched. The replacement is the whole match value (\g<0>
) the value from the SysNum
column.
CodePudding user response:
You can use zip
and list comprehension!.
>>> df['NewCol'] = [name.replace("-", f"-{num}", 1) for name, num in zip(df.Name, df.SysNum)]
>>> df
Name SysNum NewCol
0 car- at 1 car-1 at
1 car- ar 2 car-2 ar
2 truck- br 3 truck-3 br
3 car- at 4 car-4 at
Note: The third argument to the replace
method ensure that we replace only the first occurrence of -
CodePudding user response:
One approach:
df["Count"] = [f"{pre}-{num} {suf}" for num, [pre, suf] in zip(df["SysNum"], df["Name"].str.split("- "))]
print(df)
Output
Name SysNum Count
0 car- at 1 car-1 at
1 car- ar 2 car-2 ar
2 truck- br 3 truck-3 br
3 car- at 4 car-4 at
CodePudding user response:
parts = df['Name'].str.split('-', n=1)
df['NewName'] = parts.str[0] '-' df['SysNum'].astype(str) parts.str[1]
Test:
>>> df
Name SysNum NewName
0 car- at 1 car-1 at
1 car- ar 2 car-2 ar
2 truck- br 3 truck-3 br
3 car- at 4 car-4 at