Sorry if the title is unclear - I wasn't too sure how to word it. So I have a dataframe that has two columns for old IDs and new IDs.
df = pd.DataFrame({'old_id':['111', '2222','3333', '4444'], 'new_id':['5555','6666','777','8888']})
I'm trying to figure out a way to check the string length of each column/row and return any id's that don't match the required string length of 4 into a new dataframe. This will eventually turn into a dictionary of incorrect IDs.
This is the approach I'm currently taking:
incorrect_id_df = df[df.applymap(lambda x: len(x) != 4)]
and the current output:
old_id new_id
111 NaN
NaN NaN
NaN 777
NaN NaN
I'm not sure where to go from here and I'm sure there's a much better approach but this is the output I'm looking for where it's a single column dataframe with just the IDs that don't match the required string length and also with the column name id:
id
111
777
CodePudding user response:
here's part of an answer
df = pd.DataFrame({'old_id':['111', '2222','3333', '4444'], 'new_id':['5555','6666','777','8888']})
all_ids = df.values.flatten()
bad_ids = [bad_id for bad_id in all_ids if len(bad_id) != 4]
bad_ids
CodePudding user response:
Or if you are not completely sure what are you doing, you can always use brutal force method :D
import pandas as pd
df = pd.DataFrame({'old_id':['111', '2222','3333', '4444'], 'new_id':['5555','6666','777','8888']})
rows,colums= df.shape
#print (df)
for row in range(rows):
k= (df.loc[row])
for colum in range(colums):
#print(k.iloc[colum])
if len(k.iloc[colum])!=4:
print("Bad size of ID on row:" str(row) " colum:" str(colum))
CodePudding user response:
As commented by Jon Clements, stack
could be useful here – it basically stacks (duh) all columns on top of each other:
>>> df[df.applymap(len) != 4].stack().reset_index(drop=True)
0 111
1 777
dtype: object
To turn that into a single-column df named id
, you can extend it with a .rename('id').to_frame()
.
CodePudding user response:
I would stack both columns in a single one, and select the ids with length 4:
ids = df.stack()
bad_ids = ids[ids.str.len() != 4]
Output:
>>> bad_ids
0 old_id 111
2 new_id 777
dtype: object