Home > database >  LazyFrame: How to do string manipulation on values in a single column
LazyFrame: How to do string manipulation on values in a single column

Time:08-11

I want to change all string values in a LazyFrame-Column.

e.g. from "alles ok" ==> to "ALLES OK"

I see that a series has a function to do it:

polars.internals.series.StringNameSpace.to_uppercase

Q: What is the proper way to apply a string (or Date) manipulation on just one column in a LazyFrame?

Q: Do I need to extract the column I want to work on as a series and re-integrate it?

I can do math on elements of a column and put the result in a new column e.g.:

df.with_column((col("b") ** 2).alias("b_squared")).collect() 

but strings?

CodePudding user response:

Ok, after some digging I was able to take a string-column of a LazyFrame and convert it to dtype(datetime).

I also found a code snippet to apply a "len" function to the first column and add the result into a new column:

use polars::prelude::*;

fn main() {
    let df: Result<DataFrame> = df!("column_1" => &["Tuesday"],
                                "column_2" => &["1900-01-02"]);

    let options = StrpTimeOptions {
        date_dtype: DataType::Datetime(TimeUnit::Milliseconds, None),
        fmt: Some("%Y-%m-%d".into()),
        strict: false,
        exact: true,
    };

    // in-place convert string into dtype(datetime)
    let days = df
        .unwrap()
        .lazy()
        .with_column(col("column_2").str().strptime(options));

    // ### courtesy of Alex Moore-Niemi:
    let o = GetOutput::from_type(DataType::UInt32);
    fn str_to_len(str_val: Series) -> Result<Series> {
        let x = str_val
            .utf8()
            .unwrap()
            .into_iter()
            // your actual custom function would be in this map
            .map(|opt_name: Option<&str>| opt_name.map(|name: &str| name.len() as u32))
            .collect::<UInt32Chunked>();
        Ok(x.into_series())
    }
    // ###

    // add new column with length of string in column_1
    let days = days
        .with_column(col("column_1").alias("new_column").apply(str_to_len, o))
        .collect()
        .unwrap();

    let o = GetOutput::from_type(DataType::Utf8);
    fn str_to_uppercase(str_val: Series) -> Result<Series> {
        let x = str_val
            .utf8()
            .unwrap()
            .into_iter()
            // your actual custom function would be in this map
            .map(|opt_name: Option<&str>| opt_name.map(|name: &str| name.to_uppercase()))
            .collect::<Utf8Chunked>();
        Ok(x.into_series())
    }

    // column_1 to UPPERCASE ... in-place
    let days = days
        .lazy()
        .with_column(col("column_1").apply(str_to_uppercase, o))
        .collect()
        .unwrap();

    println!("{}", days);
}

  • Related