I am learning pandas and Data Science and am a beginner. I have a data as following
Rahul
1
2
5
Suresh
4
2
1
Dharm
1
3
4
I would like it in my dataframe as
Rahul 1
2
5
Suresh 4
2
1
Dharm 1
3
4
How can I achieve this without iterating over every row, as I have data in hundreds of thousand. I have searched a lot but cannot find anything other than iteration yet. Is there a better way.
Thank you for your kindness and patience
CodePudding user response:
How it'd be best formatted depends on what you plan to do with it, but a good starting place would be doing this:
Given:
Rahul
1
2
5
Suresh
4
2
1
Dharm
1
3
4
Doing:
# Read in the file and call the column 'values':
df = pd.read_table(filepath, header=None, names=['values'])
# Create a new column with names filled in:
df['names'] = df['values'].replace('\d ', np.nan, regex=True).ffill()
# Drop the extra rows:
df = df[df['values'].str.isnumeric()].reset_index(drop=True)
print(df[['names', 'values']])
Output:
names values
0 Rahul 1
1 Rahul 2
2 Rahul 5
3 Suresh 4
4 Suresh 2
5 Suresh 1
6 Dharm 1
7 Dharm 3
8 Dharm 4