Databricks CheatSheet

Quick Start This cheatsheet covers essential Databricks notebook commands, SQL operations, PySpark transformations, and optimization techniques for the lakehouse platform. Databricks Notebook Commands Magic commands provide shortcuts for common operations in Databricks notebooks: Command Purpose Use Case %python Executes python code (default language) PySpark transformations, data processing %sql Executes SQL queries Querying tables and views %scala Executes scala code Spark API operations, JVM access %r Execute R code Statistical analysis and visualization %sh Shell commands on cluster nodes Git operations, system utilities %fs Databricks file system operations File management, DBFS interactions %md Markdown text formatting Documentation and cell titles %pip Install Python packages Adding Python dependencies %env Set environment variables Configuration and secrets %config Notebook configuration options Display settings, execution parameters %jobs Lists all running jobs Job monitoring %load Load external file contents Include external code %reload Reload Python modules Refresh imports %run Execute another notebook Code reuse and modularization %lsmagic List all available magic commands Discovery %who List variables in current scope Debugging and variable inspection %matplotlib Configure matplotlib backend Visualization setup Notebook Widgets # Create widgets dbutils.widgets.text("param_name", "default_value", "label") dbutils.widgets.dropdown("param_name", "default", ["option1", "option2"]) dbutils.widgets.multiselect("param_name", "default", ["option1", "option2"]) dbutils.widgets.combobox("param_name", "default", ["option1", "option2"]) # Get widget values param_value = dbutils.widgets.get("param_name") # Remove widget dbutils.widgets.remove("param_name") dbutils.widgets.removeAll() Secrets Management # Create secret scope dbutils.secrets.createScope("scope_name") # Store secret dbutils.secrets.put("scope_name", "secret_key", "secret_value") # Retrieve secret secret_value = dbutils.secrets.get("scope_name", "secret_key") # List secrets dbutils.secrets.list("scope_name") # Delete secret dbutils.secrets.delete("scope_name", "secret_key") Accessing Files /path/to/file (local) dbfs:/path/to/file (DBFS) file:/path/to/file (driver filesystem) s3://path/to/file (S3) /Volumes/catalog/schema/volume/path (Unity Catalog Volumes) Copying Files %fs cp file:/<path> /Volumes/<catalog>/<schema>/<volume>/<path> %python dbutils.fs.cp("file:/<path>", "/Volumes/<catalog>/<schema>/<volume>/<path>") %python dbutils.fs.cp("file:/databricks/driver/test", "dbfs:/repo", True) %sh cp /<path> /Volumes/<catalog>/<schema>/<volume>/<path> SQL Statements DDL - Data Definition Language (Schema & Table Operations) Create & Use Schema CREATE SCHEMA test; CREATE SCHEMA custom LOCATION 'dbfs:/custom'; USE SCHEMA test; Unity Catalog (UC) -- Create catalog CREATE CATALOG my_catalog COMMENT "Production catalog"; -- Create schema in UC CREATE SCHEMA my_catalog.my_schema; USE CATALOG my_catalog; USE SCHEMA my_schema; -- Create volume (for files) CREATE VOLUME my_catalog.my_schema.my_volume; ALTER VOLUME my_catalog.my_schema.my_volume OWNER TO `team@company.com`; -- List catalogs, schemas, volumes SHOW CATALOGS; SHOW SCHEMAS IN my_catalog; SHOW VOLUMES IN my_catalog.my_schema; -- Grant permissions GRANT USAGE ON CATALOG my_catalog TO `user@company.com`; GRANT READ_VOLUME ON VOLUME my_catalog.my_schema.my_volume TO `user@company.com`; Create Table CREATE TABLE test(col1 INT, col2 STRING, col3 STRING, col4 BIGINT, col5 INT, col6 FLOAT); CREATE TABLE test AS SELECT * EXCEPT (_rescued_data) FROM read_files('/repo/data/test.csv'); CREATE TABLE test USING CSV LOCATION '/repo/data/test.csv'; CREATE TABLE test USING CSV OPTIONS (header="true") LOCATION '/repo/data/test.csv'; CREATE TABLE test AS SELECT * EXCEPT (_rescued_data) FROM read_files('/repo/data/test.csv'); CREATE TABLE test AS ... CREATE TABLE test USING ... CREATE TABLE test(id INT, title STRING, col1 STRING, publish_time BIGINT, pages INT, price FLOAT) COMMENT 'This is comment for the table itself'; CREATE TABLE test AS SELECT * EXCEPT (_rescued_data) FROM read_files('/repo/data/test.json', format => 'json'); CREATE TABLE test_raw AS SELECT * EXCEPT (_rescued_data) FROM read_files('/repo/data/test.csv', sep => ';'); CREATE TABLE custom_table_test LOCATION 'dbfs:/custom-table' AS SELECT * EXCEPT (_rescued_data) FROM read_files('/repo/data/test.csv'); CREATE TABLE test PARTITIONED BY (col1) AS SELECT * EXCEPT (_rescued_data) FROM read_files('/repo/data/test.csv') CREATE TABLE users( firstname STRING, lastname STRING, full_name STRING GENERATED ALWAYS AS (concat(firstname, ' ', lastname)) ); CREATE OR REPLACE TABLE test AS SELECT * EXCEPT (_rescued_data) FROM read_files('/repo/data/test.csv'); CREATE OR REPLACE TABLE test AS SELECT * FROM json.`/repo/data/test.json`; CREATE OR REPLACE TABLE test AS SELECT * FROM read_files('/repo/data/test.csv'); Create View CREATE VIEW view_test AS SELECT * FROM test WHERE col1 = 'test'; CREATE VIEW view_test AS SELECT col1, col1 FROM test JOIN test2 ON test.col2 == test2.col2; CREATE TEMP VIEW temp_test AS SELECT * FROM test WHERE col1 = 'test'; CREATE TEMP VIEW temp_test AS SELECT * FROM read_files('/repo/data/test.csv'); CREATE GLOBAL TEMP VIEW view_test AS SELECT * FROM test WHERE col1 = 'test'; SELECT * FROM global_temp.view_test; CREATE TEMP VIEW jdbc_example USING JDBC OPTIONS ( url "<jdbc-url>", dbtable "<table-name>", user '<username>', password '<password>'); CREATE OR REPLACE TEMP VIEW test AS SELECT * FROM delta.`<logpath>`; CREATE VIEW event_log_raw AS SELECT * FROM event_log("<pipeline-id>"); CREATE OR REPLACE TEMP VIEW test_view AS SELECT test.col1 AS col1 FROM test_table WHERE col1 = 'value1' ORDER BY timestamp DESC LIMIT 1; Drop & Describe DROP TABLE test; SHOW TABLES; DESCRIBE EXTENDED test; DML - Data Manipulation Language (Data Operations) Select SELECT * FROM csv.`/repo/data/test.csv`; SELECT * FROM read_files('/repo/data/test.csv'); SELECT * FROM read_files('/repo/data/test.csv', format => 'csv', header => 'true', sep => ',') SELECT * FROM json.`/repo/data/test.json`; SELECT * FROM json.`/repo/data/*.json`; SELECT * FROM test WHERE year(from_unixtime(test_time)) > 1900; SELECT * FROM test WHERE title LIKE '%a%' SELECT * FROM test WHERE title LIKE 'a%' SELECT * FROM test WHERE title LIKE '%a' SELECT * FROM test TIMESTAMP AS OF '2024-01-01T00:00:00.000Z'; SELECT * FROM test VERSION AS OF 2; SELECT * FROM test@v2; SELECT * FROM event_log("<pipeline-id>"); SELECT count(*) FROM VALUES (NULL), (10), (10) AS example(col); SELECT count(col) FROM VALUES (NULL), (10), (10) AS example(col); SELECT count_if(col1 = 'test') FROM test; SELECT from_unixtime(test_time) FROM test; SELECT cast(test_time / 1 AS timestamp) FROM test; SELECT cast(cast(test_time AS BIGINT) AS timestamp) FROM test; SELECT element.sub_element FROM test; SELECT flatten(array(array(1, 2), array(3, 4))); SELECT * FROM ( SELECT col1, col2 FROM test ) PIVOT ( sum(col1) for col2 in ('item1','item2') ); SELECT *, CASE WHEN col1 > 10 THEN 'value1' ELSE 'value2' END FROM test; SELECT * FROM test ORDER BY (CASE WHEN col1 > 10 THEN col2 ELSE col3 END); WITH t(col1, col2) AS (SELECT 1, 2) SELECT * FROM t WHERE col1 = 1; SELECT details:flow_definition.output_dataset as output_dataset, details:flow_definition.input_datasets as input_dataset FROM event_log_raw, latest_update WHERE event_type = 'flow_definition' AND origin.update_id = latest_update.id; Insert INSERT OVERWRITE test SELECT * FROM read_files('/repo/data/test.csv'); INSERT INTO test(col1, col2) VALUES ('value1', 'value2'); Merge Into MERGE INTO test USING test_to_delete ON test.col1 = test_to_delete.col1 WHEN MATCHED THEN DELETE; MERGE INTO test USING test_to_update ON test.col1 = test_to_update.col1 WHEN MATCHED THEN UPDATE SET *; MERGE INTO test USING test_to_insert ON test.col1 = test_to_insert.col1 WHEN NOT MATCHED THEN INSERT *; Copy Into COPY INTO test FROM '/repo/data' FILEFORMAT = CSV FILES = ('test.csv') FORMAT_OPTIONS('header' = 'true', 'inferSchema' = 'true'); Spark DataFrame API PySpark is the Python API for Apache Spark, enabling distributed data processing on the Databricks platform. ...

April 4, 2026 · 9 min · James M

Artemis II: Breaking the Distance Record

As the Orion spacecraft sweeps around the lunar far side, the four-person crew of Artemis II is doing more than just testing hardware—they are venturing further into the cosmos than any human being has ever traveled. Surpassing Apollo 13 For over five decades, the record for the farthest distance humans have traveled from Earth was held by the crew of Apollo 13. In April 1970, due to an emergency “free-return” trajectory, Jim Lovell, Jack Swigert, and Fred Haise reached a distance of approximately 400,171 kilometers (248,655 miles) from Earth. ...

April 4, 2026 · 2 min · James M

Quantum Computing: A Threat to Bitcoin?

Overview Quantum computing represents one of the most significant theoretical threats to modern cryptography. For Bitcoin, the primary concern lies in the potential for quantum computers to run Shor’s Algorithm, which could efficiently solve the discrete logarithm problem that secures Bitcoin’s public-key cryptography (ECDSA). The Vulnerabilities ECDSA (Elliptic Curve Digital Signature Algorithm) — Currently used to sign Bitcoin transactions. A sufficiently powerful quantum computer could derive a private key from its corresponding public key. Public Key Exposure — While Bitcoin addresses are hashed (providing a layer of protection), the public key is revealed to the network when a transaction is initiated, creating a window of vulnerability before the block is mined. Mining (SHA-256) — Grover’s Algorithm could speed up mining, but this is a quadratic improvement rather than exponential, meaning it could likely be mitigated by increasing network difficulty or hash rates. Mitigation and Post-Quantum Bitcoin The Bitcoin community is well aware of these risks. Potential solutions include: ...

April 4, 2026 · 2 min · James M

Native Instruments: From Preliminary Insolvency to M&A - What Comes Next

When Native Instruments entered preliminary insolvency in late January, it felt like a seismic moment. Two months later, the picture has gotten clearer—and in some ways, more complex. The company has now moved into formal insolvency proceedings, and simultaneously revealed it’s in active merger and acquisition talks with multiple interested buyers. This isn’t a bankruptcy death spiral; it’s a controlled restructuring. But it raises harder questions about what went wrong, and what salvation might actually look like. ...

April 4, 2026 · 6 min · James M

Taste Is the New Scarcity

If intelligence is becoming a commodity, then something else becomes precious. When you can generate a thousand solutions to a problem with a prompt, the question is no longer “can I get an answer?” The question becomes “which answer is good?” When you can write code, design systems, draft strategies, analyze data, or explore ideas simply by asking, the bottleneck shifts. It is no longer thinking capacity. It is judgment. ...

April 4, 2026 · 5 min · James M

Polkadot 2026: From Infrastructure to Applications

The Pivot Year: Polkadot’s Strategic Shift in 2026 Polkadot has undergone a fundamental transformation in 2025-2026. After years of building infrastructure layers, the ecosystem is making a decisive pivot toward user-facing applications. This isn’t just a narrative shift—it’s embedded in technical upgrades, tokenomics redesigns, and validator economics that reflect a maturing network ready to compete at the application layer. Timing: This transformation arrives as traditional finance begins acknowledging blockchain infrastructure, and as the broader crypto market cycle approaches a pivotal moment for adoption. ...

April 4, 2026 · 5 min · James M

Physical Modeling Synthesis: The Underrated Future of Sound Design

If you’ve spent any time with Pianoteq or the Audio Modeling SWAM instruments, you’ve felt something different. Not the crisp accuracy of a sampled library, not the flexibility of wavetable synthesis—but something that responds like an instrument. Strings that vibrate with sympathetic resonance. Piano keys with wooden resistance. A cello that sings differently when you bow it hard versus soft. This is physical modeling: mathematics as an instrument, not just a sampler or synth engine. ...

April 4, 2026 · 7 min · James M

NASA Artemis II Tracking Dashboards

About NASA’s Artemis II mission represents a critical step in returning humans to the Moon. Real-time tracking dashboards provide the public with live updates on mission status, vehicle telemetry, and launch preparations. These dashboards showcase NASA’s commitment to transparency, allowing space enthusiasts and stakeholders to monitor every aspect of the mission as it unfolds. Official Resources Artemis II - NASA.gov — Official NASA information and resources for the Artemis II mission. ...

April 4, 2026 · 1 min · James M

Personal AI Development Stack

This guide documents a highly productive, AI-driven development stack using cloud-based LLMs, terminal tools, IDEs, and mobile access. It is designed for developers who want persistent workflows, AI-powered coding assistance, and flexible access from multiple devices. Architecture Overview Hardware & Connectivity iPhone ──(Tailscale VPN)──→ MacBook Air ├─ tmux/zellij (session persistence) └─ Lungo, Patterned (keep-awake utilities) IDE & Editor Layer PRIMARY: Cursor AI — fastest iteration with native AI engine SECONDARY: VS Code (+ Cline + Continue.dev) — battle-tested TERMINAL: Claude Code CLI — heavy multi-file work SSH: Termius (mobile remote access) AI Tools & LLM Backends AGENTS: Cline, Claude Code, Aider, Windsurf LOCAL: Ollama (free, instant completions) ROUTERS: OpenRouter, LiteLLM (cost & speed optimization) WEB: ChatGPT, Perplexity, Claude Web, Grok Tool Selection Decision Tree Use this to pick the right tool for the task: ...

April 3, 2026 · 8 min · James M

What the Amiga Got Right (That We're Still Copying)

What the Amiga Got Right (That We’re Still Copying) The Commodore Amiga was not the most successful computer. It was not the fastest. It was not the cheapest. It was introduced in 1985, bought by Commodore in a panic, and discontinued by 1994 as the company collapsed. By most commercial metrics, it was a failure. Yet almost every good idea in modern computing traces back to the Amiga. Preemptive multitasking. Graphics layers and compositing. Named pipes. Memory protection. Hardware acceleration. Plug-and-play peripherals. Scripting languages. Digital audio and video editing. Networking. The Amiga did these things in 1985 when IBM PCs were still running in 8-bit mode. ...

April 3, 2026 · 10 min · James M