Hey folks, long time no speak! (Technically only a month, ish, but good to be back regardless)
So, for those who know me, you’ll know that I’m always keen on a hack or approach to make myself lazily efficient – That’s why I felt it was worth chatting about Agent Skills for the Databricks Assistant. In today’s AI-fuelled world, it’s just one of many productivity gains we can apply. See a future post on my general approach to embracing productivity in the AI revolution.
What Are Agent Skills?
Right, so before I go any further, it would be remiss of me not to point you at the official documentation — it’s well worth a read if you want the full picture. But in short:
Skills extend Databricks Assistant with specialised capabilities. They package domain-specific knowledge and workflows that the Assistant can load when relevant to perform specific tasks.
Think of them as little instruction packs that tell the Assistant exactly what to do. Unlike custom instructions (which are always loaded), skills are loaded only when they’re relevant. The Assistant looks at what you’re asking, checks the skill descriptions, and pulls in the right one automatically. You can also @ mention a skill directly if you want to make sure it’s used.
They follow the Agent Skills open standard, which is a nice touch — it means the format isn’t locked into Databricks specifically.
How Do They Work?
Each skill lives in its own folder under ~/.assistant/skills/ on your workspace, and at a minimum needs a SKILL.md file with some frontmatter:
~/.assistant/skills/├── etl-patterns/│ └── SKILL.md├── sql-format-guide/│ └── SKILL.md└── my-other-skill/ ├── SKILL.md ├── reference-doc.md └── scripts/ └── helper.py
The SKILL.md file has a YAML frontmatter block with a name and description (the description is what the Assistant uses to decide when to load it), followed by markdown content with your instructions, examples, and whatever else the Assistant needs to do its thing.
You can also include additional files — reference docs, scripts, templates — and reference them with relative paths from the skill folder. Skills work best when they’re focused on a single task or workflow. A narrow scope makes it easier for the Assistant to recognise when a skill applies.
The Scenario
Alright, enough theory — let’s build something. I’m going to create two practical skills and demonstrate them using the Databricks Assistant. We’ll use the WanderBricks sample dataset (a travel/booking dataset that comes preinstalled in Databricks workspaces under samples.wanderbricks) for our examples.
The two skills we’ll create are:
- etl-patterns — Guides the Assistant through building medallion architecture (bronze/silver/gold) pipelines
- sql-format-guide — Teaches the Assistant your team’s SQL formatting and naming conventions
Both are the kind of thing where you’d normally be saying “no, leading commas please” or “remember to add the ingestion timestamp” for the fiftieth time. Skills let you codify that knowledge once and have the Assistant just know it.
Prerequisites
- A Unity Catalog-enabled Databricks Workspace (shocking, I know)
- Access to the Databricks Assistant in agent mode
- The
samples.wanderbricksdataset for this demo (should be there by default) - A terminal or file editor to create the skill files – I used Claude Code to create my examples today
The Fun Stuff — Building the Skills
Skill 1: ETL Patterns
This skill teaches the Assistant how to guide you through building medallion architecture pipelines. It covers the conventions for each layer (bronze, silver, gold) and includes a worked example using WanderBricks booking data.
Create the file at ~/.assistant/skills/etl-patterns/SKILL.md:
---name: etl-patternsdescription: > Guide ETL pipeline development using medallion architecture (bronze, silver, gold). Helps with ingestion patterns, data cleansing transformations, and building aggregation layers. Use when asked about ETL, data pipelines, medallion architecture, or bronze/silver/gold tables. ETL Patterns — Medallion Architecture GuideHelp users build ETL pipelines following the medallion (bronze → silver → gold)architecture pattern. InstructionsWhen a user asks for help with ETL or data pipelines: Identify the layer they're working on: Bronze: Raw ingestion, minimal transformation, preserve source fidelity Silver: Cleansed, deduplicated, typed, joined with reference data Gold: Business-level aggregates, ready for dashboards and reporting Apply these patterns per layer: Bronze Layer Use COPY INTO or Auto Loader for file ingestion Ingest as raw strings or inferred schema with _metadata columns Add ingestion timestamp: current_timestamp() AS ingested_at Never filter or drop records at this stage Silver Layer Cast columns to correct types Deduplicate using ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC) Join with reference/dimension tables (e.g. enrich bookings with country data) Apply null handling: COALESCE(column, 'UNKNOWN') Add processed_at timestamp Gold Layer Pre-aggregate for known reporting use cases Use meaningful business names (e.g. total_booking_revenue not sum_amt) Partition by common query patterns (date, region) Document grain/granularity in table comments Example — WanderBricks Booking Pipeline BronzesqlCREATE OR REPLACE TABLE bronze.wanderbricks_bookings ASSELECT *, current_timestamp() AS ingested_atFROM samples.wanderbricks.bookings; SilversqlCREATE OR REPLACE TABLE silver.wanderbricks_bookings ASSELECT b.booking_id, b.user_id, u.name AS user_name, u.user_type, c.country, c.continent, COALESCE(c.country_code, 'UNKNOWN') AS country_code, b.status, b.total_amount, b.created_at, current_timestamp() AS processed_atFROM bronze.wanderbricks_bookings bLEFT JOIN samples.wanderbricks.users u ON b.user_id = u.user_idLEFT JOIN samples.wanderbricks.countries c ON u.country = c.country; GoldsqlCREATE OR REPLACE TABLE gold.booking_summary_by_region ASSELECT continent, country, status, COUNT(*) AS total_bookings, SUM(total_amount) AS total_revenue, AVG(total_amount) AS avg_booking_value, current_timestamp() AS refreshed_atFROM silver.wanderbricks_bookingsGROUP BY continent, country, status;
The key thing here is the description field in the frontmatter. That’s what the Assistant uses to decide when this skill is relevant. If someone asks “help me build an ETL pipeline” or “how should I structure my bronze layer”, the Assistant will automatically pull this skill in. (Huzzah!)
The example section is important too — it gives the Assistant concrete patterns it can adapt. Rather than generating ETL code from scratch each time (which can be hit and miss), it’s got your team’s preferred patterns right there.
Skill 2: SQL Format Guide
This one’s for anyone who’s ever had that “tabs vs spaces” argument, but for SQL formatting. We’ve all got opinions, myself included, and this skill codifies your team’s conventions so the Assistant writes SQL your way from the start.
Create the file at ~/.assistant/skills/sql-format-guide/SKILL.md:
---name: sql-format-guidedescription: > SQL formatting standards and code style conventions. Use when writing any SQL query, SELECT statement, CREATE TABLE, CTE, or JOIN. Applies leading comma formatting, UPPERCASE keywords, snake_case naming, and table aliasing rules. Always apply when generating or reviewing SQL code. SQL Format Guide — Team ConventionsApply these formatting and naming conventions to all SQL code generated or reviewed. Naming Conventions Tables & Views Use snake_case for all object names Prefix staging tables with stg_ Prefix intermediate tables with int_ Gold/reporting tables use business-friendly names without prefixes Always include the three-part name: catalog.schema.table Columns Use snake_case Boolean columns: prefix with is_ or has_ (e.g. is_active, has_discount) Date columns: suffix with _date (e.g. booking_date) Timestamp columns: suffix with _at (e.g. created_at, processed_at) Amount/money columns: suffix with _amount or _value (e.g. total_amount) Count columns: prefix with total_ or suffix with _count (e.g. total_bookings) Formatting Rules General Keywords in UPPERCASE: SELECT, FROM, WHERE, JOIN, GROUP BY, ORDER BY Table/column names in lowercase One column per line in SELECT statements Leading commas (comma at start of line, not end) Indent with 2 spaces, not tabs ExamplesqlSELECT b.booking_id ,b.user_id ,u.name AS user_name ,c.country ,c.continent ,b.status ,b.total_amount ,b.created_atFROM samples.wanderbricks.bookings AS bLEFT JOIN samples.wanderbricks.users AS u ON b.user_id = u.user_idLEFT JOIN samples.wanderbricks.countries AS c ON u.country = c.countryWHERE b.status = 'confirmed' AND b.total_amount > 0ORDER BY b.created_at DESC CTEs Always name CTEs descriptively Use CTEs instead of subqueries for readability Format as:sqlWITH booking_details AS ( SELECT b.booking_id ,b.user_id ,b.total_amount ,b.status FROM samples.wanderbricks.bookings AS b WHERE b.status != 'cancelled'),user_totals AS ( SELECT user_id ,COUNT(*) AS total_bookings ,SUM(total_amount) AS total_spend FROM booking_details GROUP BY user_id)SELECT u.name ,ut.total_bookings ,ut.total_spendFROM user_totals AS utINNER JOIN samples.wanderbricks.users AS u ON ut.user_id = u.user_idORDER BY ut.total_spend DESC JOINs Always use explicit INNER JOIN, never implicit joins Always alias tables Put join condition on the next line, indented with ON For multi-condition joins, use AND on a new indented line WHERE Clauses First condition on same line as WHERE Subsequent conditions on new lines with AND/OR leading Use parentheses for mixed AND/OR logic
Now, I know the leading commas thing is contentious (I can already hear the trailing comma purists sharpening their pitchforks), but the beauty of skills is that you define the conventions for your team. Swap in whatever formatting rules work for you.
Ensuring Skills Are Always Loaded
Now, here’s the thing — in my testing, I found that the auto-loading of skills isn’t always reliable. The Assistant sometimes decides it doesn’t need a skill for what it considers a straightforward request, even when you’d really want it applied (a SQL formatting guide being the prime example — you want it on every query, not just the ones the Assistant deems complex enough).
The good news is there’s a simple workaround: add a reference to your skills in the workspace-level custom instructions. Custom instructions are always loaded into the Assistant’s context, so they serve as a reliable nudge to help ensure your skills are picked up.
You can set these in the Assistant settings under Custom Instructions. For example:
When writing or generating SQL code, always load and apply the SQL-format-guide skill.When building ETL pipelines or working with medallion architecture, always load and apply the etl-patterns skill.
It’s a bit of a belt-and-braces approach, but it works. The skill still does the heavy lifting with all the detailed conventions and examples — the custom instruction just makes sure it actually gets loaded. Think of it as the difference between having a policy document on the shelf and having someone remind you to read it before you start work.
You can also @ mention a skill directly in your prompt if you want to be explicit on a per-request basis — for example, @sql-format-guide write me a query... will guarantee the skill is loaded for that specific request.
Seeing Them in Action
Once the files are saved and the custom instructions are set, switch to the Databricks Assistant in agent mode and try some prompts.
For the etl-patterns skill, try something like:
“I need to build a medallion pipeline for the WanderBricks bookings data. Can you help me set up the bronze, silver, and gold layers?”
The Assistant should pick up the skill and guide you through each layer using the patterns we defined — including the ingestion timestamps, the joins with reference data, and the business-friendly aggregation layer.
For the sql-format-guide skill, try:
“Write me a query that shows the top 10 countries by total confirmed booking revenue, including the average booking value.”
And you should see the Assistant write SQL with leading commas, UPPERCASE keywords, proper aliases, and all the other conventions we specified. No more reformatting after the fact.
Here’s what that gold layer query looks like when generated with our formatting skill applied:
SELECT c.continent ,c.country ,b.status ,COUNT(*) AS total_bookings ,ROUND(SUM(b.total_amount), 2) AS total_revenue ,ROUND(AVG(b.total_amount), 2) AS avg_booking_valueFROM samples.wanderbricks.bookings AS bLEFT JOIN samples.wanderbricks.users AS u ON b.user_id = u.user_idLEFT JOIN samples.wanderbricks.countries AS c ON u.country = c.countryWHERE b.status = 'confirmed'GROUP BY c.continent, c.country, b.statusORDER BY total_revenue DESCLIMIT 10
Leading commas, proper aliases, UPPERCASE keywords — awesome beans. And when you run it against the WanderBricks data, you get results like:
| continent | country | total_bookings | total_revenue | avg_booking_value |
|---|---|---|---|---|
| Asia | India | 3,333 | 1,840,274.10 | 552.14 |
| Asia | China | 3,249 | 1,780,317.87 | 547.96 |
| North America | United States | 785 | 436,753.34 | 556.37 |
| Asia | Indonesia | 598 | 338,068.66 | 565.33 |
| Asia | Pakistan | 578 | 328,467.49 | 568.28 |
(I’ve truncated to five rows for the sake of brevity — you get the idea.)
Tips for Writing Good Skills
Having played around with these for a bit, here are a few things I’ve found help:
- Keep the scope narrow — One skill, one job. A skill that tries to cover “all SQL things” will be less reliable than separate skills for formatting, optimisation, and debugging.
- Nail the description — This is how the Assistant decides when to load your skill. Be specific and keyword-rich — include the actual terms people use when they’d need this skill. Think of it like SEO for your skill.
- Include concrete examples — The Assistant does much better when it has patterns to adapt rather than vague instructions. Show it what good looks like.
- Reinforce with custom instructions — For skills you want applied consistently (like formatting guides), add a one-liner in your workspace custom instructions to nudge the Assistant into loading the skill. It’s a small overhead for much more reliable behaviour.
- Iterate — Treat skills as living documents. If the Assistant keeps getting something wrong, update the skill. Small tweaks based on real usage make a big difference.
- Use additional files for complex skills — If your skill needs reference material (templates, schema docs, etc.), put them in separate files and reference them with relative paths. Keeps the main SKILL.md clean.
Scaling Skills Across Your Organisation
So far we’ve been creating skills for individual use, but what if you want to share a set of skills across your entire team or organisation? You don’t want every developer independently writing their own ETL patterns skill — you want one source of truth that everyone pulls from.
The good news is that, since skills are just files on the workspace filesystem, you can use Databricks Git folders to keep them in sync with a shared GitHub repository. The approach is straightforward:
- Create a GitHub repo containing your organisation’s standard skills (e.g.
your-org/databricks-skills) - Each user sets up their
~/.assistant/skills/directory as a Git folder pointing to that repo - When skills are updated in the repo, users simply pull the latest changes
This gives you version-controlled, centrally managed skills that the whole team benefits from. Think company-wide SQL conventions, standard ETL patterns, glossaries of business-specific terms, data quality check templates — all maintained in one place and distributed to every workspace user.
There’s also a growing ecosystem of open-source skill repositories worth keeping an eye on, including Databricks’ own public agent skills repo.
I’ll do a deeper dive on setting this up properly in a future post — including the Git folder configuration, repo structure, and how to handle team-specific vs organisation-wide skills. Watch this space.
Conclusion
That’s pretty much it. Agent Skills are a nice, lightweight way to teach the Databricks Assistant your team’s patterns and conventions without having to repeat yourself every time. The fact that they only load when relevant (rather than always sitting in the context window) is a smart design choice, and the open standard backing means it’s not a proprietary lock-in situation.
The auto-loading isn’t perfect yet — you’ll likely want to pair your skills with a custom instruction nudge for the most reliable experience — but the combination of the two works well. And as the feature matures, I’d expect the automatic matching to get better over time.
Hopefully, this has given you a few ideas for skills you could create for your own workflows. ETL patterns and SQL formatting are just the tip of the iceberg — think data quality checks, naming conventions for Unity Catalog objects, team-specific debugging runbooks, or even onboarding guides for new team members.
As always, let me know your thoughts (and other skill ideas, of course!). You can find me on the usual channels.