I apologize for asking such a basic question, but I've been stuck at this point for almost a week.
I have the dataframe below, there are anomalies in the name
column, but I have been able to fix a part of it using the code below
names = ['a', 'an', 'my', 'by', 'mad', 'very', 'just', 'quite', 'one', 'actually', 'life', 'light', 'officially','his', 'old', 'this', 'all','the']
archive[archive['name'].isin(names) & archive['text'].str.contains('named')]['text'].str.split('named').str[1].str.split('.').str[0]
I get the output below:
1853 Wylie
1955 Kip
2034 Jacob (Yacōb)
2066 Rufus
2116 Spork
2125 Cherokee
2128 Hemry
2146 Alphred
2161 Alfredo
2191 Leroi
2204 Berta
2218 Chuk
2235 Alfonso
2249 Cheryl
2255 Jessiga
2264 Klint
2273 Kohl
2304 Pepe
2311 Octaviath
2314 Johm
Name: text, dtype: object
But I was to apply the changes that I have made to be applied but I'm not sure how to go about it. Any help please?
CodePudding user response:
IIUC, you can conditionally assign the split text
column the name
column with .loc
m = archive['name'].isin(names) & archive['text'].str.contains('named')
archive.loc[m, 'name'] = archive['text'].str.split('named').str[1].str.split('.').str[0]