I have a tricky problem to select column in a dataframe. I have a dataframe and multiple columns in it have the same name "PTime".
This is my dataframe:
PTime first_column PTime third_column PTime fourth_column
0 4 first_value 1 first_value 6 first_value
1 4 second_value 2 second_value 7 second_value
This is what I want:
PTime first_column PTime fourth_column
0 4 first_value 6 first_value
1 4 second_value 7 second_value
I will select my columns from a list:
My code:
data = {'PTime': ['1', '1'],
'first_column': ['first_value', 'second_value'],
'PTime': ['2', '2'],
'third_column': ['first_value', 'second_value'],
'PTime': ['4', '4'],
'fourth_column': ['first_value', 'second_value'],
}
list_c = ['PTime','first_column','fourth_column']
df = pd.DataFrame(data)
#df = df[df.columns.intersection(list_c)]
df = df[list_c]
df
So my goal is to select the column that is in the list and select the column to the left of the one in the list. I if you have any idea to do that, thank you really much. Regards
CodePudding user response:
I don't exactly know how to get left of one in list But i have a trick to get desired table which you want as shown
PTime first_column PTime fourth_column
0 4 first_value 6 first_value
1 4 second_value 7 second_value
what we can do is simply remove the column by index But here as there are same name pandas will to try to delete the first row But you can simply rename the columns if there are duplicates name and then you can use indexing to delete columns..
So here find some logic to rename it like PTime1 .. PTime2 .. PTime3 .. and then use indexes to remove it
df.drop(df.columns[i], axis=1,inplace=True)
// or //
df = df.drop(df.columns[i], axis=1)
Here you have to pass the list of indices . In your case it will be like
df.drop(df.columns[[2,3]],axis=1)
After renaming columns
CodePudding user response:
In my dataframe I will not have multiple columns with the same name. All names will be distinct.
So in the case I have ten columns to select it will be difficult to list them all in a list.
data = {'PTime1': ['1', '1'],
'first_column': ['first_value', 'second_value'],
'PTime2': ['2', '2'],
'third_column': ['first_value', 'second_value'],
'PTime3': ['4', '4'],
'fourth_column': ['first_value', 'second_value'],
}
list_c = ['first_column','fourth_column'] #define column to select
df = pd.DataFrame(data) #create dataframe
list_index = [] #create list to store index column
for col in list_c:
index_no = df.columns.get_loc(col) #get index column
list_index.append(index_no-1) #insert index-1 in a list. Get column from the left
list_index.append(index_no) #insert index from the column in the list.
df = df.iloc[:, list_index] #Subset the dataframe with the list of column selected.
df
Like this I can select the column from my list and the column on the left of each element in my list.