I am dealing with a large dataframe (198,619 rows x 19,110 columns) and so am using the polars package to read in the tsv file. Pandas just takes too long.
However, I now face an issue as I want to transform each cell's value x
raising it by base 2 as follows: 2^x
.
I run the following line as an example:
df_copy = df
df_copy[:,1] = 2**df[:,1]
But I get this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/var/tmp/pbs.98503.hn-10-03/ipykernel_196334/3484346087.py in <module>
1 df_copy = df
----> 2 df_copy[:,1] = 2**df[:,1]
~/.local/lib/python3.9/site-packages/polars/internals/frame.py in __setitem__(self, key, value)
1845
1846 # dispatch to __setitem__ of Series to do modification
-> 1847 s[row_selection] = value
1848
1849 # now find the location to place series
~/.local/lib/python3.9/site-packages/polars/internals/series.py in __setitem__(self, key, value)
512 self.__setitem__([key], value)
513 else:
--> 514 raise ValueError(f'cannot use "{key}" for indexing')
515
516 def estimated_size(self) -> int:
ValueError: cannot use "slice(None, None, None)" for indexing
This should be simple but I can't figure it out as I'm new to Polars.
CodePudding user response:
The secret to harnessing the speed and flexibility of Polars is to learn to use Expressions. As such, you'll want to avoid Pandas-style indexing methods.
Let's start with this data:
import polars as pl
nbr_rows = 4
nbr_cols = 5
df = pl.DataFrame({
"col_" str(col_nbr): pl.arange(col_nbr, nbr_rows col_nbr, eager=True)
for col_nbr in range(0, nbr_cols)
})
df
shape: (4, 5)
┌───────┬───────┬───────┬───────┬───────┐
│ col_0 ┆ col_1 ┆ col_2 ┆ col_3 ┆ col_4 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═══════╪═══════╪═══════╪═══════╪═══════╡
│ 0 ┆ 1 ┆ 2 ┆ 3 ┆ 4 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 1 ┆ 2 ┆ 3 ┆ 4 ┆ 5 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2 ┆ 3 ┆ 4 ┆ 5 ┆ 6 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 3 ┆ 4 ┆ 5 ┆ 6 ┆ 7 │
└───────┴───────┴───────┴───────┴───────┘
In Polars we would express your calculations as:
df_copy = df.select(pl.lit(2).pow(pl.all()).keep_name())
print(df_copy)
shape: (4, 5)
┌───────┬───────┬───────┬───────┬───────┐
│ col_0 ┆ col_1 ┆ col_2 ┆ col_3 ┆ col_4 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═══════╪═══════╪═══════╪═══════╪═══════╡
│ 1.0 ┆ 2.0 ┆ 4.0 ┆ 8.0 ┆ 16.0 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2.0 ┆ 4.0 ┆ 8.0 ┆ 16.0 ┆ 32.0 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 4.0 ┆ 8.0 ┆ 16.0 ┆ 32.0 ┆ 64.0 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 8.0 ┆ 16.0 ┆ 32.0 ┆ 64.0 ┆ 128.0 │
└───────┴───────┴───────┴───────┴───────┘