Introduction To Agent Skills With The Databricks Assistant

Hey folks, long time no speak! (Technically only a month, ish, but good to be back regardless)

So, for those who know me, you’ll know that I’m always keen on a hack or approach to make myself lazily efficient – That’s why I felt it was worth chatting about Agent Skills for the Databricks Assistant. In today’s AI-fuelled world, it’s just one of many productivity gains we can apply. See a future post on my general approach to embracing productivity in the AI revolution.

What Are Agent Skills?

Right, so before I go any further, it would be remiss of me not to point you at the official documentation — it’s well worth a read if you want the full picture. But in short:

Skills extend Databricks Assistant with specialised capabilities. They package domain-specific knowledge and workflows that the Assistant can load when relevant to perform specific tasks.

Think of them as little instruction packs that tell the Assistant exactly what to do. Unlike custom instructions (which are always loaded), skills are loaded only when they’re relevant. The Assistant looks at what you’re asking, checks the skill descriptions, and pulls in the right one automatically. You can also @ mention a skill directly if you want to make sure it’s used.

They follow the Agent Skills open standard, which is a nice touch — it means the format isn’t locked into Databricks specifically.

How Do They Work?

Each skill lives in its own folder under ~/.assistant/skills/ on your workspace, and at a minimum needs a SKILL.md file with some frontmatter:

~/.assistant/skills/
├── etl-patterns/
│ └── SKILL.md
├── sql-format-guide/
│ └── SKILL.md
└── my-other-skill/
├── SKILL.md
├── reference-doc.md
└── scripts/
└── helper.py

The SKILL.md file has a YAML frontmatter block with a name and description (the description is what the Assistant uses to decide when to load it), followed by markdown content with your instructions, examples, and whatever else the Assistant needs to do its thing.

You can also include additional files — reference docs, scripts, templates — and reference them with relative paths from the skill folder. Skills work best when they’re focused on a single task or workflow. A narrow scope makes it easier for the Assistant to recognise when a skill applies.

The Scenario

Alright, enough theory — let’s build something. I’m going to create two practical skills and demonstrate them using the Databricks Assistant. We’ll use the WanderBricks sample dataset (a travel/booking dataset that comes preinstalled in Databricks workspaces under samples.wanderbricks) for our examples.

The two skills we’ll create are:

  1. etl-patterns — Guides the Assistant through building medallion architecture (bronze/silver/gold) pipelines
  2. sql-format-guide — Teaches the Assistant your team’s SQL formatting and naming conventions

Both are the kind of thing where you’d normally be saying “no, leading commas please” or “remember to add the ingestion timestamp” for the fiftieth time. Skills let you codify that knowledge once and have the Assistant just know it.

Prerequisites

  • A Unity Catalog-enabled Databricks Workspace (shocking, I know)
  • Access to the Databricks Assistant in agent mode
  • The samples.wanderbricks dataset for this demo (should be there by default)
  • A terminal or file editor to create the skill files – I used Claude Code to create my examples today

The Fun Stuff — Building the Skills

Skill 1: ETL Patterns

This skill teaches the Assistant how to guide you through building medallion architecture pipelines. It covers the conventions for each layer (bronze, silver, gold) and includes a worked example using WanderBricks booking data.

Create the file at ~/.assistant/skills/etl-patterns/SKILL.md:

---
name: etl-patterns
description: >
Guide ETL pipeline development using medallion architecture (bronze, silver, gold).
Helps with ingestion patterns, data cleansing transformations, and building
aggregation layers. Use when asked about ETL, data pipelines, medallion architecture,
or bronze/silver/gold tables.
---
# ETL Patterns — Medallion Architecture Guide
Help users build ETL pipelines following the medallion (bronze → silver → gold)
architecture pattern.
## Instructions
When a user asks for help with ETL or data pipelines:
1. **Identify the layer** they're working on:
- **Bronze**: Raw ingestion, minimal transformation, preserve source fidelity
- **Silver**: Cleansed, deduplicated, typed, joined with reference data
- **Gold**: Business-level aggregates, ready for dashboards and reporting
2. **Apply these patterns per layer**:
### Bronze Layer
- Use `COPY INTO` or Auto Loader for file ingestion
- Ingest as raw strings or inferred schema with `_metadata` columns
- Add ingestion timestamp: `current_timestamp() AS ingested_at`
- Never filter or drop records at this stage
### Silver Layer
- Cast columns to correct types
- Deduplicate using `ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC)`
- Join with reference/dimension tables (e.g. enrich bookings with country data)
- Apply null handling: `COALESCE(column, 'UNKNOWN')`
- Add `processed_at` timestamp
### Gold Layer
- Pre-aggregate for known reporting use cases
- Use meaningful business names (e.g. `total_booking_revenue` not `sum_amt`)
- Partition by common query patterns (date, region)
- Document grain/granularity in table comments
## Example — WanderBricks Booking Pipeline
### Bronze
```sql
CREATE OR REPLACE TABLE bronze.wanderbricks_bookings AS
SELECT *, current_timestamp() AS ingested_at
FROM samples.wanderbricks.bookings;
```
### Silver
```sql
CREATE OR REPLACE TABLE silver.wanderbricks_bookings AS
SELECT
b.booking_id,
b.user_id,
u.name AS user_name,
u.user_type,
c.country,
c.continent,
COALESCE(c.country_code, 'UNKNOWN') AS country_code,
b.status,
b.total_amount,
b.created_at,
current_timestamp() AS processed_at
FROM bronze.wanderbricks_bookings b
LEFT JOIN samples.wanderbricks.users u ON b.user_id = u.user_id
LEFT JOIN samples.wanderbricks.countries c ON u.country = c.country;
```
### Gold
```sql
CREATE OR REPLACE TABLE gold.booking_summary_by_region AS
SELECT
continent,
country,
status,
COUNT(*) AS total_bookings,
SUM(total_amount) AS total_revenue,
AVG(total_amount) AS avg_booking_value,
current_timestamp() AS refreshed_at
FROM silver.wanderbricks_bookings
GROUP BY continent, country, status;
```

The key thing here is the description field in the frontmatter. That’s what the Assistant uses to decide when this skill is relevant. If someone asks “help me build an ETL pipeline” or “how should I structure my bronze layer”, the Assistant will automatically pull this skill in. (Huzzah!)

The example section is important too — it gives the Assistant concrete patterns it can adapt. Rather than generating ETL code from scratch each time (which can be hit and miss), it’s got your team’s preferred patterns right there.

Skill 2: SQL Format Guide

This one’s for anyone who’s ever had that “tabs vs spaces” argument, but for SQL formatting. We’ve all got opinions, myself included, and this skill codifies your team’s conventions so the Assistant writes SQL your way from the start.

Create the file at ~/.assistant/skills/sql-format-guide/SKILL.md:

---
name: sql-format-guide
description: >
SQL formatting standards and code style conventions. Use when writing any SQL
query, SELECT statement, CREATE TABLE, CTE, or JOIN. Applies leading comma
formatting, UPPERCASE keywords, snake_case naming, and table aliasing rules.
Always apply when generating or reviewing SQL code.
---
# SQL Format Guide — Team Conventions
Apply these formatting and naming conventions to all SQL code generated or reviewed.
## Naming Conventions
### Tables & Views
- Use `snake_case` for all object names
- Prefix staging tables with `stg_`
- Prefix intermediate tables with `int_`
- Gold/reporting tables use business-friendly names without prefixes
- Always include the three-part name: `catalog.schema.table`
### Columns
- Use `snake_case`
- Boolean columns: prefix with `is_` or `has_` (e.g. `is_active`, `has_discount`)
- Date columns: suffix with `_date` (e.g. `booking_date`)
- Timestamp columns: suffix with `_at` (e.g. `created_at`, `processed_at`)
- Amount/money columns: suffix with `_amount` or `_value` (e.g. `total_amount`)
- Count columns: prefix with `total_` or suffix with `_count` (e.g. `total_bookings`)
## Formatting Rules
### General
- Keywords in UPPERCASE: `SELECT`, `FROM`, `WHERE`, `JOIN`, `GROUP BY`, `ORDER BY`
- Table/column names in lowercase
- One column per line in SELECT statements
- Leading commas (comma at start of line, not end)
- Indent with 2 spaces, not tabs
### Example
```sql
SELECT
b.booking_id
,b.user_id
,u.name AS user_name
,c.country
,c.continent
,b.status
,b.total_amount
,b.created_at
FROM samples.wanderbricks.bookings AS b
LEFT JOIN samples.wanderbricks.users AS u
ON b.user_id = u.user_id
LEFT JOIN samples.wanderbricks.countries AS c
ON u.country = c.country
WHERE b.status = 'confirmed'
AND b.total_amount > 0
ORDER BY b.created_at DESC
```
### CTEs
- Always name CTEs descriptively
- Use CTEs instead of subqueries for readability
- Format as:
```sql
WITH booking_details AS (
SELECT
b.booking_id
,b.user_id
,b.total_amount
,b.status
FROM samples.wanderbricks.bookings AS b
WHERE b.status != 'cancelled'
)
,user_totals AS (
SELECT
user_id
,COUNT(*) AS total_bookings
,SUM(total_amount) AS total_spend
FROM booking_details
GROUP BY user_id
)
SELECT
u.name
,ut.total_bookings
,ut.total_spend
FROM user_totals AS ut
INNER JOIN samples.wanderbricks.users AS u
ON ut.user_id = u.user_id
ORDER BY ut.total_spend DESC
```
### JOINs
- Always use explicit `INNER JOIN`, never implicit joins
- Always alias tables
- Put join condition on the next line, indented with `ON`
- For multi-condition joins, use `AND` on a new indented line
### WHERE Clauses
- First condition on same line as `WHERE`
- Subsequent conditions on new lines with `AND`/`OR` leading
- Use parentheses for mixed `AND`/`OR` logic

Now, I know the leading commas thing is contentious (I can already hear the trailing comma purists sharpening their pitchforks), but the beauty of skills is that you define the conventions for your team. Swap in whatever formatting rules work for you.

Ensuring Skills Are Always Loaded

Now, here’s the thing — in my testing, I found that the auto-loading of skills isn’t always reliable. The Assistant sometimes decides it doesn’t need a skill for what it considers a straightforward request, even when you’d really want it applied (a SQL formatting guide being the prime example — you want it on every query, not just the ones the Assistant deems complex enough).

The good news is there’s a simple workaround: add a reference to your skills in the workspace-level custom instructions. Custom instructions are always loaded into the Assistant’s context, so they serve as a reliable nudge to help ensure your skills are picked up.

You can set these in the Assistant settings under Custom Instructions. For example:

When writing or generating SQL code, always load and apply the SQL-format-guide skill.
When building ETL pipelines or working with medallion architecture, always load and apply the etl-patterns skill.

It’s a bit of a belt-and-braces approach, but it works. The skill still does the heavy lifting with all the detailed conventions and examples — the custom instruction just makes sure it actually gets loaded. Think of it as the difference between having a policy document on the shelf and having someone remind you to read it before you start work.

You can also @ mention a skill directly in your prompt if you want to be explicit on a per-request basis — for example, @sql-format-guide write me a query... will guarantee the skill is loaded for that specific request.

Seeing Them in Action

Once the files are saved and the custom instructions are set, switch to the Databricks Assistant in agent mode and try some prompts.

For the etl-patterns skill, try something like:

“I need to build a medallion pipeline for the WanderBricks bookings data. Can you help me set up the bronze, silver, and gold layers?”

The Assistant should pick up the skill and guide you through each layer using the patterns we defined — including the ingestion timestamps, the joins with reference data, and the business-friendly aggregation layer.

For the sql-format-guide skill, try:

“Write me a query that shows the top 10 countries by total confirmed booking revenue, including the average booking value.”

And you should see the Assistant write SQL with leading commas, UPPERCASE keywords, proper aliases, and all the other conventions we specified. No more reformatting after the fact.

Here’s what that gold layer query looks like when generated with our formatting skill applied:

SELECT
c.continent
,c.country
,b.status
,COUNT(*) AS total_bookings
,ROUND(SUM(b.total_amount), 2) AS total_revenue
,ROUND(AVG(b.total_amount), 2) AS avg_booking_value
FROM samples.wanderbricks.bookings AS b
LEFT JOIN samples.wanderbricks.users AS u
ON b.user_id = u.user_id
LEFT JOIN samples.wanderbricks.countries AS c
ON u.country = c.country
WHERE b.status = 'confirmed'
GROUP BY c.continent, c.country, b.status
ORDER BY total_revenue DESC
LIMIT 10

Leading commas, proper aliases, UPPERCASE keywords — awesome beans. And when you run it against the WanderBricks data, you get results like:

continentcountrytotal_bookingstotal_revenueavg_booking_value
AsiaIndia3,3331,840,274.10552.14
AsiaChina3,2491,780,317.87547.96
North AmericaUnited States785436,753.34556.37
AsiaIndonesia598338,068.66565.33
AsiaPakistan578328,467.49568.28

(I’ve truncated to five rows for the sake of brevity — you get the idea.)

Tips for Writing Good Skills

Having played around with these for a bit, here are a few things I’ve found help:

  • Keep the scope narrow — One skill, one job. A skill that tries to cover “all SQL things” will be less reliable than separate skills for formatting, optimisation, and debugging.
  • Nail the description — This is how the Assistant decides when to load your skill. Be specific and keyword-rich — include the actual terms people use when they’d need this skill. Think of it like SEO for your skill.
  • Include concrete examples — The Assistant does much better when it has patterns to adapt rather than vague instructions. Show it what good looks like.
  • Reinforce with custom instructions — For skills you want applied consistently (like formatting guides), add a one-liner in your workspace custom instructions to nudge the Assistant into loading the skill. It’s a small overhead for much more reliable behaviour.
  • Iterate — Treat skills as living documents. If the Assistant keeps getting something wrong, update the skill. Small tweaks based on real usage make a big difference.
  • Use additional files for complex skills — If your skill needs reference material (templates, schema docs, etc.), put them in separate files and reference them with relative paths. Keeps the main SKILL.md clean.

Scaling Skills Across Your Organisation

So far we’ve been creating skills for individual use, but what if you want to share a set of skills across your entire team or organisation? You don’t want every developer independently writing their own ETL patterns skill — you want one source of truth that everyone pulls from.

The good news is that, since skills are just files on the workspace filesystem, you can use Databricks Git folders to keep them in sync with a shared GitHub repository. The approach is straightforward:

  1. Create a GitHub repo containing your organisation’s standard skills (e.g. your-org/databricks-skills)
  2. Each user sets up their ~/.assistant/skills/ directory as a Git folder pointing to that repo
  3. When skills are updated in the repo, users simply pull the latest changes

This gives you version-controlled, centrally managed skills that the whole team benefits from. Think company-wide SQL conventions, standard ETL patterns, glossaries of business-specific terms, data quality check templates — all maintained in one place and distributed to every workspace user.

There’s also a growing ecosystem of open-source skill repositories worth keeping an eye on, including Databricks’ own public agent skills repo.

I’ll do a deeper dive on setting this up properly in a future post — including the Git folder configuration, repo structure, and how to handle team-specific vs organisation-wide skills. Watch this space.

Conclusion

That’s pretty much it. Agent Skills are a nice, lightweight way to teach the Databricks Assistant your team’s patterns and conventions without having to repeat yourself every time. The fact that they only load when relevant (rather than always sitting in the context window) is a smart design choice, and the open standard backing means it’s not a proprietary lock-in situation.

The auto-loading isn’t perfect yet — you’ll likely want to pair your skills with a custom instruction nudge for the most reliable experience — but the combination of the two works well. And as the feature matures, I’d expect the automatic matching to get better over time.

Hopefully, this has given you a few ideas for skills you could create for your own workflows. ETL patterns and SQL formatting are just the tip of the iceberg — think data quality checks, naming conventions for Unity Catalog objects, team-specific debugging runbooks, or even onboarding guides for new team members.

As always, let me know your thoughts (and other skill ideas, of course!). You can find me on the usual channels.

Useful Links

Leave a Reply