cols = list(ds.columns.values)
ds = ds[cols[1:3] cols[5:6] [cols[9]]]
print(ds)
Why did we convert into list in this line cols = list(ds.columns.values)
?
CodePudding user response:
If ds is a DataFrame from Pandas:
type(ds.columns.values)
>>> <class 'numpy.ndarray'>
If you sum two differences columns of string or char in numpy:
a1 = np.char.array(['a', 'b'])
a2 = np.char.array(['c', 'd'])
a1 a2
>>> chararray(['ac', 'bd'], dtype='<U2')
and not:
np.char.array(['a', 'b', 'c', 'd'])
That why you should convert it in list because:
list1 = ['a','b']
list2 = ['c','d']
list1 list2
>>> ['a','b','c','d']
Remember, pandas.DataFrame need a list of columns, that why you should feed DataFrame a list :
panda.DataFrame[[columns1,columns2,columns5,columns9]]
CodePudding user response:
ds.columns
returns an ndarray
, so slicing it will also produce ndarray
s.
between ndarray
s behave differently than in between lists
df = pd.DataFrame({'col1': [1, 2, 3, 4], 'col2': [1, 2, 3, 4], 'col3': [1, 2, 3, 4], 'col4': [1, 2, 3, 4],
'col5': [1, 2, 3, 4], 'col6': [1, 2, 3, 4], 'col7': [1, 2, 3, 4], 'col8': [1, 2, 3, 4]})
cols_arr = df.columns.values
cols_list = list(df.columns.values)
print(cols_arr[0:2] cols_arr[3:4] [cols_arr[7]])
print(cols_list[0:2] cols_list[3:4] [cols_list[7]])
Output
['col1col4col8' 'col2col4col8']
['col1', 'col2', 'col4', 'col8']
When you try to get to access the dataframe df[cols_arr[0:2] cols_arr[2:3] [cols_arr[3]]]
using the first result you will get
KeyError: "None of [Index(['col1col4col8', 'col2col4col8'], dtype='object')] are in the [columns]"
With the lists df[cols_list[0:2] cols_list[3:4] [cols_list[7]]]
you will get the new dataframe
col1 col2 col4 col8
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
3 4 4 4 4
CodePudding user response:
If you do slicing for a single numpy.ndarray
or a single list
, you would be able to get the dataframe:
cols = ds.columns.values #numpy.ndarray
ds = ds[cols[1:3]] #ok
cols = ds.columns.tolist() #list
ds = ds[cols[1:3]] #ok
However, if you use the
operator, the behavior is different between numpy.ndarray
and list
cols = ds.columns.values #numpy.ndarray
ds = ds[cols[1:3] cols[5:6]] #ERROR
cols = ds.columns.tolist() #list
ds = ds[cols[1:3] cols[5:6]] #ok
That is because the
operator is "concatenation" for list
,
whereas for numpy.ndarray
, the
operator is numpy.add
.
In other words, cols[1:3] cols[5:6]
is actually doing np.add(cols[1:3], cols[5:6])
Refer to documentation for more details.
CodePudding user response:
A simpler way to convert columns into a list:
ds.columns.tolist()
But this also seems unnecessary. ds.columns
returns an Index
. You can select values from Index
just like from a normal list, and then append them to each other using .union
:
cols = ds.columns
ds = ds[cols[1:3].union(cols[5:6]).union(cols[9])]
Note that you can use .iloc
to reach your goal in a more idiomatic way:
ds = ds.iloc[:, [1, 2, 5, 9]]