This is my current code
df['company_id'] = ''
length = 0
while length < len(df):
for x in df:
if df['associations.companies.results'][length] == 'nan':
df.loc[df['associations.companies.results'] == 'nan', 'company_id'] = 0
else:
df['company_id'][length] = df['associations.companies.results'][length][0]['id']
length = length 1
I tried to run this code with Lambda and np.where versions, however, these gave errors that I couldn't solve. The data set has close to 40 rows and I try to get the company ID out of a dict nested in a list. It looks like this on each row:
[{'id': 'XXXXXXXXXX', 'type': 'call_to_company'}]
sometimes there is no company_id and it will look like:
nan
The final result would be a separate column called "company_id" that contains the 'id' value.
Right now the code has been running for 30 mins and still going strong
Hope anyone can help. Thanks!
CodePudding user response:
There are various improvements that you could make, but i'm still not entirely sure what kind of output you are expecting.
First of all you execute the len()
function at each iteration, because you put it in the header of the while
loop, this is an error, since you need to execute it only once.
Second: you have a double for
loop (I think because you wanted to iterate both through indexes and for the elements), but this is a big error since this way you have a O(n^2) complexity instead of a O(n) one.
You could've use enumerate(df)
or simply use only the indexes
df['company_id'] = ''
for i in range(len(df)):
if df['associations.companies.results'][i] == 'nan':
df.loc[df['associations.companies.results'] == 'nan', 'company_id'] = 0
else:
df['company_id'][i] = df['associations.companies.results'][i][0]['id']
I'm sure this could be further improved with lists comprehension or DataFrame .apply()
, but I still don't understand your goal, so this is the most I can do.
If you've never heard before of Big-O notation I recommend you to read this
CodePudding user response:
Hope I understood your use case, so here is my idea:
Try using foreach
and enumerate()
! With this, you can totally avoid having a counter variable.
Like so:
df['company_id'] = ''
for i, x in enumerate(df):
if df['associations.companies.results'][i] == 'nan':
df.loc[df['associations.companies.results'] == 'nan', 'company_id'] = 0
else:
df['company_id'][i] = df['associations.companies.results'][i][0]['id']
Sadly, your code is not so reproducible, so I hope I was able to understand