Skip to content

Filter Expressions

Filter expressions let you pre-filter data before aggregation or option computation using Polars syntax. They are available on card and interactive components via the filter_expr YAML field.

Introduced in v0.8.0-b2

Filter expressions were added in PR #710.

Overview

┌─────────────────────────────────────────────────────────────────┐
│                    Data Pipeline                                │
│                                                                 │
│  Delta Table ──▶ Interactive Filters ──▶ filter_expr ──▶ Result │
│                  (user selections)       (YAML-defined)         │
│                                                                 │
│  Card:        filter_expr narrows data before aggregation       │
│  Interactive: filter_expr scopes dropdown options / slider range │
└─────────────────────────────────────────────────────────────────┘

Key behaviors:

  • filter_expr applies on top of interactive filters (dual-layer)
  • On cards: narrows the dataset before computing the metric value
  • On interactive components: restricts available options/range to the filtered subset
  • Expressions are validated and executed in a sandboxed namespace for security

Syntax

Expressions use Polars column syntax. Two constructors are available:

Constructor Description Example
col('name') Reference a column col('coverage')
lit(value) Explicit literal value lit(30)

Comparison Operators

filter_expr: "col('coverage') >= 30"
filter_expr: "col('quality') > 80"
filter_expr: "col('status') == 'passed'"
filter_expr: "col('status') != 'failed'"

Logical Operators

Combine conditions with & (AND), | (OR), ~ (NOT). Use parentheses for grouping:

# AND — both conditions must be true
filter_expr: "(col('coverage') >= 30) & (col('quality') > 80)"

# OR — either condition
filter_expr: "(col('type') == 'tumor') | (col('type') == 'metastasis')"

# NOT — negate a condition
filter_expr: "~col('sample_type').is_in(['control', 'blank'])"

Membership & Null Checks

Method Description Example
.is_in([...]) Value in list col('biome').is_in(['forest', 'ocean'])
.is_null() Value is null col('score').is_null()
.is_not_null() Value is not null col('score').is_not_null()
.is_between(lo, hi) Value in range col('expression').is_between(1.0, 100.0)

String Methods

Access via the .str namespace:

Method Description Example
.str.contains(pat) Substring match col('gene').str.contains('HOX')
.str.starts_with(pfx) Prefix match col('id').str.starts_with('SAMPLE_')
.str.ends_with(sfx) Suffix match col('file').str.ends_with('.fastq')
.str.to_lowercase() Convert to lowercase col('name').str.to_lowercase()
.str.to_uppercase() Convert to uppercase col('name').str.to_uppercase()
.str.strip() Trim whitespace col('name').str.strip()

Date/Time Accessors

Access via the .dt namespace:

Method Description Example
.dt.year() Extract year col('run_date').dt.year() == 2026
.dt.month() Extract month col('run_date').dt.month() >= 6
.dt.day() Extract day col('run_date').dt.day() == 1

Window Functions

Broadcast an aggregation across groups with .over('group_column'). This enables group-level filtering — keep or discard entire groups based on an aggregate property:

Pattern Description
col('x').count().over('group') >= N Groups with at least N rows
col('x').mean().over('group') > threshold Groups with mean above threshold
col('x').std().over('group') < threshold Low-variance groups
col('x').min().over('group') > threshold Groups where all values exceed threshold

Available aggregation methods: .mean(), .sum(), .min(), .max(), .count(), .std(), .median()

# Only taxa with 100+ observations
filter_expr: "col('taxonomy').count().over('taxonomy') >= 100"

# Genes with mean expression > 1.0
filter_expr: "col('expression').mean().over('gene') > 1.0"

# Batches where minimum read depth exceeds 30
filter_expr: "col('read_depth').min().over('batch') > 30"

Column-to-Column Comparison

Compare two columns directly:

filter_expr: "col('tumor_expr') > col('normal_expr')"
filter_expr: "col('sepal.length') > col('petal.length')"

Type Casting

filter_expr: "col('score').cast(float) > 0.5"

Usage on Cards

Add filter_expr to a card component to compute a conditional aggregation:

# Count only high-quality samples
- tag: hq-sample-count
  component_type: card
  workflow_tag: python/samples_workflow
  data_collection_tag: samples
  aggregation: count
  column_name: sample_id
  filter_expr: "(col('coverage') >= 30) & (col('contamination') < 0.05)"
  title: "HQ Samples"
  icon_name: mdi:check-circle
  icon_color: "#43A047"

Works with multi-metric summary cards — all secondary metrics are also computed on the filtered data:

- tag: filtered-summary
  component_type: card
  aggregation: average
  aggregations: [median, std_dev, min, max]
  column_name: expression
  filter_expr: "col('gene_type') == 'protein_coding'"
  # ...

Usage on Interactive Components

Add filter_expr to an interactive component to scope its options:

# Only show varieties that have petal.length > 4
- tag: long-petal-varieties
  component_type: interactive
  interactive_component_type: MultiSelect
  column_name: variety
  filter_expr: "col('petal.length') > 4"
  title: "Varieties (petal > 4 cm)"
  # ...

Effect by component type:

Component Effect of filter_expr
Select / MultiSelect / SegmentedControl Only shows unique values present in filtered data
Slider / RangeSlider Adjusts min/max range to filtered data
DateRangePicker Adjusts date range to filtered data

Security

Expressions are validated before execution. Only Polars column operations are allowed:

Allowed: col(), lit(), comparison operators, logical operators, the methods listed above

Blocked: import, exec, eval, open, os, sys, lambda, def, class, loops, dunder attributes, and all other Python builtins

Expressions run in a restricted namespace containing only col and lit — no access to the broader Python runtime.

See Also