I want to change all string values in a LazyFrame-Column.
e.g. from "alles ok" ==> to "ALLES OK"
I see that a series has a function to do it:
polars.internals.series.StringNameSpace.to_uppercase
Q: What is the proper way to apply a string (or Date) manipulation on just one column in a LazyFrame?
Q: Do I need to extract the column I want to work on as a series and re-integrate it?
I can do math on elements of a column and put the result in a new column e.g.:
df.with_column((col("b") ** 2).alias("b_squared")).collect()
but strings?
CodePudding user response:
Ok, after some digging I was able to take a string-column of a LazyFrame and convert it to dtype(datetime).
I also found a code snippet to apply a "len" function to the first column and add the result into a new column:
use polars::prelude::*;
fn main() {
let df: Result<DataFrame> = df!("column_1" => &["Tuesday"],
"column_2" => &["1900-01-02"]);
let options = StrpTimeOptions {
date_dtype: DataType::Datetime(TimeUnit::Milliseconds, None),
fmt: Some("%Y-%m-%d".into()),
strict: false,
exact: true,
};
// in-place convert string into dtype(datetime)
let days = df
.unwrap()
.lazy()
.with_column(col("column_2").str().strptime(options));
// ### courtesy of Alex Moore-Niemi:
let o = GetOutput::from_type(DataType::UInt32);
fn str_to_len(str_val: Series) -> Result<Series> {
let x = str_val
.utf8()
.unwrap()
.into_iter()
// your actual custom function would be in this map
.map(|opt_name: Option<&str>| opt_name.map(|name: &str| name.len() as u32))
.collect::<UInt32Chunked>();
Ok(x.into_series())
}
// ###
// add new column with length of string in column_1
let days = days
.with_column(col("column_1").alias("new_column").apply(str_to_len, o))
.collect()
.unwrap();
let o = GetOutput::from_type(DataType::Utf8);
fn str_to_uppercase(str_val: Series) -> Result<Series> {
let x = str_val
.utf8()
.unwrap()
.into_iter()
// your actual custom function would be in this map
.map(|opt_name: Option<&str>| opt_name.map(|name: &str| name.to_uppercase()))
.collect::<Utf8Chunked>();
Ok(x.into_series())
}
// column_1 to UPPERCASE ... in-place
let days = days
.lazy()
.with_column(col("column_1").apply(str_to_uppercase, o))
.collect()
.unwrap();
println!("{}", days);
}