I am trying to label a pandas dataframe with labels 0,1,2 etc.
Code I am trying is
import pandas as pd
# initialise data of lists.
data = {'Path':['/content/gdrive/MyDrive/DOA_dataset/theta_30/s1', '/content/gdrive/MyDrive/DOA_dataset/theta_30/s2', '/content/gdrive/MyDrive/DOA_dataset/theta_60/s1','/content/gdrive/MyDrive/DOA_dataset/theta_60/s2',
'/content/gdrive/MyDrive/DOA_dataset/theta_90/s1','/content/gdrive/MyDrive/DOA_dataset/theta_90/s2']}
# Create DataFrame
df = pd.DataFrame(data)
Expected Output
Path Label
0 /content/gdrive/MyDrive/DOA_dataset/theta_30/s1 0
1 /content/gdrive/MyDrive/DOA_dataset/theta_30/s2 0
2 /content/gdrive/MyDrive/DOA_dataset/theta_60/s1 1
3 /content/gdrive/MyDrive/DOA_dataset/theta_60/s2 1
4 /content/gdrive/MyDrive/DOA_dataset/theta_90/s1 2
5 /content/gdrive/MyDrive/DOA_dataset/theta_90/s2 2
I tried several ways like based on re pattern etc. But none was a success... Any help.
CodePudding user response:
Try with split
then factorize
df['new'] = df.Path.str.rsplit('/',n=1).str[0].factorize()[0]
df
Path new
0 /content/gdrive/MyDrive/DOA_dataset/theta_30/s1 0
1 /content/gdrive/MyDrive/DOA_dataset/theta_30/s2 0
2 /content/gdrive/MyDrive/DOA_dataset/theta_60/s1 1
3 /content/gdrive/MyDrive/DOA_dataset/theta_60/s2 1
4 /content/gdrive/MyDrive/DOA_dataset/theta_90/s1 2
5 /content/gdrive/MyDrive/DOA_dataset/theta_90/s2 2
CodePudding user response:
If you want to have two columns, let's say x and y, in your dataframe, then you need to have something like :
df = pd.DataFrame({"First_column":x,"Second_column":y})
So, maybe you can build x and y with a loop by going through your files before creating the dataframe.