π Depictio Project Types: Choose Your Data Strategy¶
From simple CSV files to complex bioinformatics pipelines - discover which project type fits your needs.
π¬ Projects Management Overview: Discover how Depictio's project types organize your data workflow - from simple file uploads to complex bioinformatics pipelines
General features of Depictio Projects¶
Projects are designed to help you organize your data and structure your analysis, corresponding to the top-level entity in Depictio. Each project can contain multiple workflows and data collections (advanced type), which are essentially groups of related data files relevant to your analysis. User can then create interactive dashboards out of these data collections, allowing for flexible and powerful data visualization.
Generic features of Depictio projects include:
- Data Organization: Projects help you structure your data, making it easier to manage and analyze.
- Workflow Integration (advanced): Projects can be linked to specific workflows, allowing you to track and manage your data processing pipelines.
- Interactive Dashboards: Create visualizations and dashboards based on the data within your projects.
- Role Management: Control access to projects, related data and resulting dashboards through user roles and permissions. Each user can be either : Owner, Editor, or Viewer of a project.
π― Two Approaches, a large number of possibilities¶
Depictio offers two distinct project types designed for different data scenarios. Think of them as different answers for organizing your data - each optimized for specific use cases.
Whether you're looking for a plotly studio like experience for immediate analysis of a tabular dataset, or a bioinformatician managing complex pipeline outputs, choosing the right project type sets the foundation for effective data exploration.
π Basic Projects: Your Gateway to Interactive Dashboards¶
Perfect for: Direct data analysis
Basic projects are designed for users who want to quickly visualize and explore tabular data without the overhead of complex configurations. They are ideal for one-off analyses or exploratory data visualization.
Main Features of Basic Projects¶
- Start Quickly - Upload files and start visualizing within minutes
- Minimal configuration required - Works with any tabular data format
- Perfect for exploration - Ideal when you want to "see what the data tells you"
π¬ Basic Project Creation: Watch how to create a basic project from scratch - upload data, configure settings, and start visualizing in minutes
𧬠Advanced Projects: Suited for Bioinformatics Workflows¶
Perfect for: Standardised workflows outputs, multi-sample studies
Advanced projects are tailored for bioinformatics workflows executed in core facilities-like setup. Users can discover and organize files generated by complex pipelines like nf-core, Snakemake, or Nextflow in a structured, centralized manner in order to create interactive dashboards that aggregate data across multiple samples, timepoints, or experimental conditions.
Main Features of Advanced Projects compared to Basic Projects¶
- Sequencing run data organisation compatible - Automatically finds and organizes files based on naming conventions and directory structures, perfect for core facilities managing recurrent sequencing projects relying on standardized processing pipelines.
- Multi-sample analysis - Handles large datasets with hundreds of samples, automatically aggregating results across multiple runs or timepoints.
- Combine multiple data collections - Joins different data types (e.g., gene expression, variant calls) into unified dashboards. Ingest data without modifying the original data files at the workflow level by doing post-processing in Depictio.
π Quick Decision Guide¶
Choose Basic when:¶
β
You have limited number of files ready to analyze (CSV, Excel, Parquet)
β
One-time analysis or ad-hoc exploration
β
Manual data preparation is acceptable
β
Quick insights are the primary goal
Choose Advanced when:¶
β
Automated pipeline generates your data (nf-core, Snakemake, etc.)
β
Standardized file organization and naming conventions exist
β
You need to aggregate data across multiple samples or runs
β
Regular data updates are expected
π Getting Started Paths¶
Basic Project Quickstart¶
Through the Web Interface:
- Visit demo.depictio.embl.org
- Click "Create Project" β Choose "Basic" and fill in project details
- Travels to your newly created project and click "Create Data Collection"
- Fill in details and upload your tabular file as data collection
- Go to the "Dashboards" tab
- Start creating dashboard components corresponding to your project
Common File Formats Supported:
- CSV files (most common) & TSV files (tab-separated values)
- Excel spreadsheets (.xlsx, .xls)
- Parquet files (efficient for large datasets)
- Feather files (.feather)
Advanced Project Quickstart¶
Project Structure Setup: Advanced projects require a YAML configuration file that describes your data organization patterns. This tells Depictio how to automatically discover and organize your files.
Example Study Structure:
study_directory/
βββ depictio_project.yaml # Depictio configuration
βββ run_001/ # First batch of samples - run 001
β βββ sample_A/
β β βββ stats/
β β β βββ sample_A_stats.tsv # Statistics for sample A
β β βββ analysis_results/
β β βββ sample_A_analysis.tsv # Analysis results for sample A
β βββ sample_B/
β βββ stats/
β β βββ sample_B_stats.tsv # Statistics for sample B
β βββ analysis_results/
β βββ sample_B_analysis.tsv
βββ run_002/ # Second batch of samples - run 002
βββ sample_C/
β βββ stats/
β β βββ sample_C_stats.tsv
β βββ analysis_results/
β βββ sample_C_analysis.tsv
βββ sample_D/
βββ stats/
β βββ sample_D_stats.tsv
βββ analysis_results/
βββ sample_D_analysis.tsv
The configuration file below describes patterns for finding and organizing these files automatically.
Example depictio_project.yaml:
# =============================================================================
# DEPICTIO PROJECT CONFIGURATION
# Complete configuration for the study structure shown above
# =============================================================================
# Project identification - displayed in Depictio web interface
name: "My Bioinformatics Study"
# Description of the project
description: "A comprehensive study of multiple samples with detailed statistics and analysis results"
# Project type
# This is an advanced project with structured data discovery
type: "advanced"
# Data management platform project URL
# Optional: link to external project management system
data_management_platform_project_url: "https://example.com/project/my-bioinformatics-study"
# Public/private visibility
# Set to true if you want this project to be publicly accessible
is_public: false
# =============================================================================
# WORKFLOW DEFINITIONS
# Define the pipelines that generated your data
# =============================================================================
workflows:
- name: "bioinformatics_pipeline"
# Engine that executed the workflow
engine:
name: "nextflow" # Workflow management system used
version: "24.10.3" # Version for reproducibility
description: "Multi-sample bioinformatics analysis pipeline"
version: "3.1" # Your pipeline version
# =============================================================================
# DATA DISCOVERY CONFIGURATION
# Tell Depictio where to find your data and how it's organized
# =============================================================================
config:
# Where your workflow output directories are located
parent_runs_location:
# Environment variables (like {DATA_LOCATION}) are resolved at runtime
# This allows flexible deployment across different systems
- "{DATA_LOCATION}/study_directory"
# Regular expression to identify run directories
# "run_.*" matches: run_001, run_002, run_abc, etc.
runs_regex: "run_.*"
# =============================================================================
# DATA COLLECTIONS
# Define the different types of data files to be ingested
# =============================================================================
data_collections:
# COLLECTION 1: Sample Statistics
# ========================================
- data_collection_tag: "sample_stats"
description: "Statistics for each sample"
config:
# Table = tabular data (CSV, TSV, Excel, etc.)
# Futur Alternative: JBrowse2, GeoJSON, etc.
type: "Table"
# Aggregate = combine multiple files into one dataset
# Alternative: Metadata = single file per run
metatype: "Aggregate"
# File discovery settings
scan:
# recursive = search through subdirectories
# single = look for one specific file per run
mode: "recursive"
scan_parameters:
regex_config:
# Find all files matching this pattern within each run directory
# Example matches: run_001/sample_A/stats/sample_A_stats.tsv
# run_001/sample_B/stats/sample_B_stats.tsv
pattern: "stats/.*_stats.tsv"
# Data processing configuration specific to type Table
dc_specific_properties:
format: "TSV" # Tab-separated values
# Polars DataFrame configuration (high-performance data processing)
polars_kwargs:
separator: "\t" # Tab separator
has_header: true # First row contains column names
# Other options: skip_rows, column_types, etc.
# Only keep these columns (improves performance and reduces memory)
keep_columns:
- "sample_id" # Links samples across datasets
- "total_reads" # Sequencing depth metric
- "mapped_reads" # Alignment quality metric
- "quality_score" # Overall sample quality
# Human-readable descriptions for dashboard tooltips
columns_description:
sample_id: "Unique sample identifier"
total_reads: "Total number of sequencing reads"
mapped_reads: "Successfully aligned reads"
quality_score: "Overall sample quality metric"
# COLLECTION 2: Analysis Results
# ========================================
- data_collection_tag: "analysis_results"
description: "Analysis results for each sample"
config:
type: "Table"
metatype: "Aggregate"
scan:
mode: "recursive"
scan_parameters:
regex_config:
# Find analysis result files in each sample directory
# Example: run_001/sample_A/analysis_results/sample_A_analysis.tsv
pattern: "analysis_results/.*_analysis.tsv"
dc_specific_properties:
format: "TSV"
polars_kwargs:
separator: "\t"
has_header: true
keep_columns:
- "sample_id" # Join key for linking datasets
- "gene_expression" # Expression analysis results
- "variant_count" # Variant calling results
- "pathway_enrichment" # Functional analysis results
# =============================================================================
# DATA JOINING
# Combine this collection with others for integrated analysis
# =============================================================================
join:
# Columns used to match records between datasets
on_columns:
- "sample_id" # Common identifier across collections
# Type of join (inner = only samples present in both datasets)
# Options: inner, left, right, outer
how: "inner"
# Other collections to join with
with_dc:
- "sample_stats" # Combine analysis with quality metrics
π§ Technical Deep Dive¶
CLI Workflow Commands¶
Once you have your configuration file, use the CLI to process your project:
# Complete workflow: validate β sync β scan β process
depictio-cli run --project-config-path ./bioinfoinformatics_project.yaml
This single command handles the entire pipeline automatically. For detailed CLI usage and individual step commands, see the CLI documentation.
Data Processing Pipeline using depictio-cli
¶
Using the run
command, the CLI executes this pipeline for advanced projects:
- β Server Check - Verify connection to Depictio backend
- β S3 Storage Check - Validate cloud storage configuration
- β Config Validation - Ensure YAML structure is correct
- β Config Sync - Register project with server
- β File Scan - Discover files matching patterns
- β Data Process - Convert files to Delta Lake format for dashboarding
Each step can be skipped with flags like --skip-scan
or --skip-process
for debugging.
π¬ π₯οΈ `depictio-cli run` command example
File Discovery Patterns¶
Depictio supports two main scanning modes that adapt to different data organization structures:
Single File Collection:
This is usually suitable for metadata files or summary statistics that are generated once at the project level.
Finds one specific file per run directory. This is adapted for cases where you have a single summary file per sample or run.
Recursive File Collection:
Uses regex patterns to find files at any depth in the directory structure.
π Project Types Comparison¶
Choosing the right project type is crucial for your data analysis success. Here's a comprehensive comparison to help you decide:
Feature | Basic Projects | Advanced Projects |
---|---|---|
Setup Complexity | Minimal - Web UI or CLI | CLI with YAML config required |
Data Compatibility | Simple tabular data (CSV, Excel) | Complex bioinformatics workflows |
Multi-sample Support | Limited to single datasets | Designed for hundreds of samples |
Data Processing | Direct conversion to Delta table | Aggregation & joining capabilities |
Best For | Quick analysis | Production workflows, core facilities |
Learning Curve | Immediate - no learning required | Moderate - requires YAML reference knowledge |
Scalability | Small to medium datasets | Large-scale, multi-run studies |
π‘ Key Takeaway¶
Basic Projects excel at getting you from data to insights quickly, perfect for exploratory analysis and presentations. Advanced Projects are useful when you need systematic, reproducible data management for complex, multi-sample studies with standardized workflows.
Both project types deliver the same rich, interactive dashboard experience - the difference lies in how your data is processed and gets ingested by the system.
πΊοΈ What's Next?¶
Now that you understand project types, you're ready to create your first interactive dashboard:
- π¨ Create Your First Dashboard - Step-by-step tutorial
- π CLI Usage Guide - Complete command documentation
- π§ Advanced Configuration - Multi-collection joins, custom workflows
Thomas Weber
August 2025