I have this dataframe:
0 1 2 3
0 Frank 48.2 test_1 file_1
1 John 46.7 test_1 file_1
2 Alice 39.3 test_2 file_2
3 Kim 35.6 test_2 file_2
4 Sasha 25.5 test_3 file_3
....
2306 rows × 4 columns
I want that for every different value on the column 2 (there are 140 different values), it will be added a row in my dataframe before the first row with that value, keeping the file_number value in the column 3 (I will need that column for saving the dataframe splitted in different files depending on the value in it), like this:
0 1 2 3
0 test_1 file_1
1 Frank 48.2 test_1 file_1
2 John 46.7 test_1 file_1
3 test_2 file_2
4 Alice 39.3 test_2 file_2
5 Kim 35.6 test_2 file_2
6 test_3 file_3
7 Sasha 25.5 test_3 file_3
....
Which is the simplest way to achieve it? Thank you for your time!
CodePudding user response:
You can check with drop_duplicates
, then concat
them back
s = df.drop_duplicates(['2','3']).drop(['0','1'],axis=1).rename({'2':'0'},axis=1)
out = pd.concat([s,df]).sort_index().reindex(columns=df.columns)
out
Out[15]:
0 1 2 3
0 test_1 NaN NaN file_1
0 Frank 48.2 test_1 file_1
1 John 46.7 test_1 file_1
2 test_2 NaN NaN file_2
2 Alice 39.3 test_2 file_2
3 Kim 35.6 test_2 file_2
4 test_3 NaN NaN file_3
4 Sasha 25.5 test_3 file_3
CodePudding user response:
You can filter the rows with the correct value of column 2
, add to that DataFrame the row you want, and concatenate all the DataFrames obtained into one. An example is the following code:
import pandas as pd
df = <READ_YOUR_DF>
all_df = []
for i in df["2"].unique():
new_df = pd.DataFrame(data= {"0": [i], "1":[""],"2":[""], "3":[""]})
filter_df = df[df["2"] == i]
to_add = pd.concat([new_df, filter_df], ignore_index=True)
all_df.append(to_add)
result_df=pd.concat(all_df, ignore_index=True)
If you want to avoid listing all the column names when creating new_df
you can use a dictionary comprehension that uses as key
the iteration over df.columns