I would like to know when we want to print only specific columns in pandas how to implement that-CodePudding

cols = list(ds.columns.values)
ds = ds[cols[1:3]   cols[5:6]   [cols[9]]]
print(ds)

Why did we convert into list in this line cols = list(ds.columns.values)?

CodePudding user response：

If ds is a DataFrame from Pandas:

type(ds.columns.values)
>>> <class 'numpy.ndarray'>

If you sum two differences columns of string or char in numpy:

a1 = np.char.array(['a', 'b'])
a2 = np.char.array(['c', 'd'])
a1   a2
>>> chararray(['ac', 'bd'], dtype='<U2')

and not:

np.char.array(['a', 'b', 'c', 'd'])

That why you should convert it in list because:

   list1 = ['a','b']
   list2 = ['c','d']
   list1   list2
   >>> ['a','b','c','d']

Remember, pandas.DataFrame need a list of columns, that why you should feed DataFrame a list :

panda.DataFrame[[columns1,columns2,columns5,columns9]]

CodePudding user response：

ds.columns returns an ndarray, so slicing it will also produce ndarrays. between ndarrays behave differently than in between lists

df = pd.DataFrame({'col1': [1, 2, 3, 4], 'col2': [1, 2, 3, 4], 'col3': [1, 2, 3, 4], 'col4': [1, 2, 3, 4],
                   'col5': [1, 2, 3, 4], 'col6': [1, 2, 3, 4], 'col7': [1, 2, 3, 4], 'col8': [1, 2, 3, 4]})

cols_arr = df.columns.values
cols_list = list(df.columns.values)

print(cols_arr[0:2]   cols_arr[3:4]   [cols_arr[7]])
print(cols_list[0:2]   cols_list[3:4]   [cols_list[7]])

Output

['col1col4col8' 'col2col4col8']
['col1', 'col2', 'col4', 'col8']

When you try to get to access the dataframe df[cols_arr[0:2] cols_arr[2:3] [cols_arr[3]]] using the first result you will get

KeyError: "None of [Index(['col1col4col8', 'col2col4col8'], dtype='object')] are in the [columns]"

With the lists df[cols_list[0:2] cols_list[3:4] [cols_list[7]]] you will get the new dataframe

   col1  col2  col4  col8
0     1     1     1     1
1     2     2     2     2
2     3     3     3     3
3     4     4     4     4

CodePudding user response：

If you do slicing for a single numpy.ndarray or a single list, you would be able to get the dataframe:

cols = ds.columns.values      #numpy.ndarray
ds = ds[cols[1:3]]            #ok

cols = ds.columns.tolist()    #list
ds = ds[cols[1:3]]            #ok

However, if you use the operator, the behavior is different between numpy.ndarray and list

cols = ds.columns.values           #numpy.ndarray
ds = ds[cols[1:3]   cols[5:6]]     #ERROR

cols = ds.columns.tolist()         #list
ds = ds[cols[1:3]   cols[5:6]]     #ok

That is because the operator is "concatenation" for list,

whereas for numpy.ndarray, the operator is numpy.add.

In other words, cols[1:3] cols[5:6] is actually doing np.add(cols[1:3], cols[5:6])

Refer to documentation for more details.

CodePudding user response：

A simpler way to convert columns into a list:

ds.columns.tolist()

But this also seems unnecessary. ds.columns returns an Index. You can select values from Index just like from a normal list, and then append them to each other using .union:

cols = ds.columns
ds = ds[cols[1:3].union(cols[5:6]).union(cols[9])]

Note that you can use .iloc to reach your goal in a more idiomatic way:

ds = ds.iloc[:, [1, 2, 5, 9]]