Complete Usage Guide
Installation
pip install kgz
Single dependency: websocket-client.
Getting Your Kaggle URL
- Open a notebook on kaggle.com/code
- Enable GPU or TPU in Settings
- Copy the URL from your browser:
https://kkb-production.jupyter-proxy.kaggle.net/k/12345/eyJhbG.../proxy - Pass it to kgz
The URL contains a session JWT. It expires when the session ends (~12h GPU, ~9h TPU).
Core: Connect & Execute
from kgz import Kernel
k = Kernel(url)
k = Kernel(url, name="my-session") # Named for save/resume
Execute Code
result = k.execute("print('hello')", stream=False)
result.success # True
result.stdout # "hello\n"
result.return_value # None (print has no return)
result.error_name # None
result.elapsed_seconds # 0.1
Important: Use stream=False in scripts. stream=True (default) prints to stdout.
Expressions Return Values
result = k.execute("2 + 2", stream=False)
result.return_value # "4"
Error Handling
result = k.execute("1/0", stream=False)
result.success # False
result.error_name # "ZeroDivisionError"
result.error_value # "division by zero"
# Or raise exceptions
from kgz import KernelError
try:
k.execute("1/0", raise_on_error=True, stream=False)
except KernelError as e:
print(e.result.traceback)
Quota Tracking
Kaggle limits: 30h/week GPU, 20h/week TPU.
k.start_quota_tracking() # Start counting
# ... work ...
k.stop_quota_tracking() # Log usage
k.quota_summary()
# GPU: 2.1h used / 30h quota (27.9h remaining)
# Session limit: 9.9h remaining
Output Caching
Cache idempotent cells (pip install, imports, data loading):
result = k.execute_cached("import jax; print(jax.__version__)")
# First call: 100ms (remote execution)
# Second call: 1ms (local cache) — 98x speedup
Errors are not cached — only successful results.
k.clear_cache() # Reset
Pipeline
Run a sequence of steps with quota tracking, caching, and notifications:
results = k.pipeline([
("Install", "pip install -q jax flax"),
("Check GPU", "import jax; print(jax.devices())"),
("Load data", "data = load_dataset('tiny')"),
("Train", "train(data, steps=5000)"),
], notify_url="https://hooks.slack.com/...", use_cache=True)
# Setup steps are cached, training is not
# Slack notification on completion or failure
for label, result in results:
print(f"{label}: {'OK' if result.success else 'FAIL'}")
Multi-Cell Execution
Execute cells sequentially with shared state:
results = k.execute_notebook([
"import jax",
"model = build()",
"loss = train(model)",
], stop_on_error=True, stream=False)
Import & Run .ipynb
results = k.run_notebook("training.ipynb", stop_on_error=True)
Inspect Remote State
Variable Snapshot
snap = k.snapshot()
# {"model": {"type": "GPT", "shape": "(18M,)", "repr": "GPT(...)"},
# "loss": {"type": "float", "repr": "2.31"},
# "data": {"type": "ndarray", "shape": "(106212345,)", "len": 106212345}}
GPU/TPU Resources
res = k.resources()
# {"backend": "gpu", "device_count": 2,
# "gpus": [{"utilization": 85, "memory_used_mb": 12000, "memory_total_mb": 15360}],
# "cpu_percent": 17.2, "ram_used_gb": 8.6, "ram_total_gb": 33.7}
File Operations
Upload / Download
from kgz import upload_file, download_file
from kgz.file_ops import upload_directory, list_files
upload_file(url, "model.py", "model.py")
upload_directory(url, "./src", "src")
download_file(url, "/kaggle/working/results.json", "./results.json")
Model Download (with size)
k.download_model("/kaggle/working/model.pkl", "./model.pkl")
# Downloading model.pkl (73.2 MB)...
# Saved: ./model.pkl
File Sync (Watch Mode)
from kgz import FileSync
sync = FileSync(url, "./src", "/kaggle/working/src")
sync.push() # One-shot upload of changed files
sync.start() # Background watcher — auto-uploads on change
# ... edit files locally ...
sync.stop()
Environment
Secrets
k.set_env(HF_TOKEN="hf_...", WANDB_API_KEY="...")
# Secrets are excluded from execution history and notebook export
Snapshot / Restore
k.snapshot_env() # pip freeze → ~/.kgz/{name}-reqs.txt
k.restore_env() # pip install from snapshot
Notifications
k.execute_notify(code,
notify_url="https://hooks.slack.com/...",
label="Training")
# Sends Slack message: "[kgz] Training completed (45.2s)"
Session Persistence
k.save_session() # Save to ~/.kgz/{name}.json
k = Kernel.resume("my-session") # Resume in a new script
Kernel.list_sessions() # List all saved
Notebook Export
k.to_notebook("output.ipynb")
# Exports execution history as a real Jupyter notebook
Parallel Execution
Run on multiple Kaggle sessions simultaneously:
k1 = Kernel(url1)
k2 = Kernel(url2)
results = Kernel.parallel_execute([k1, k2],
"import jax; print(jax.devices())")
CLI
kgz run URL "code" # Execute code
kgz exec URL -f script.py # Execute local file
kgz status URL # idle/busy
kgz interrupt URL # Ctrl-C
kgz wait URL # Block until idle
kgz restart URL # Restart kernel
kgz upload URL file [remote] # Upload file
kgz download URL remote [local] # Download file
kgz ls URL [path] # List files
kgz info URL # Kernel info
kgz snapshot URL # Variable inspection
kgz resources URL # GPU/CPU usage
kgz sync URL local_dir # Watch & sync
kgz notebook URL -f cells.txt # Run notebook
kgz sessions # List saved sessions
Kaggle Limits
| Resource | Limit |
|---|---|
| GPU | 30 hours/week |
| TPU | 20 hours/week |
| GPU session | 12 hours max |
| TPU session | 9 hours max |
| Disk | 73 GB |
| RAM | 13-30 GB (depends on GPU) |
Use k.quota_summary() to check remaining time.
GPU & TPU Detection
kgz works with both Kaggle GPU (T4) and TPU (v3-8) accelerators:
k.is_tpu() # True if TPU kernel
k.tpu_type() # "TPU v3-8" or "Tesla T4"
k.device_info() # Full details: backend, device_count, platform per device
Health Dashboard
k.health_check()
# Kaggle Kernel Health
# ==================================================
# Kernel: busy
# Backend: gpu (2 devices)
# GPU 0: 85% util, 12000/15360 MB
# GPU 1: 82% util, 11500/15360 MB
# CPU: 17%
# RAM: 8.6/33.7 GB
# Train: step 1234/5000 | loss 2.31 | 56,000 tok/s
# ETA: ~35m
# Quota: 27.9h remaining (GPU)
# Session: 9.8h before expiry
Training Progress Parsing
kgz automatically parses metrics from your training output:
progress = k.training_progress()
# {"step": 1234, "total_steps": 5000, "loss": 2.31, "tok_per_sec": 56000}
Supports common log formats from JAX, PyTorch, flaxchat, nanochat.
Budget Alerts
Kaggle quota is limited. Set a budget to avoid running out:
k.set_budget(max_hours=8, notify_url="https://hooks.slack.com/...")
# Alerts at 6.4h (80%), interrupts at 8h
Kaggle Datasets
Kaggle datasets are pre-mounted at /kaggle/input/:
k.list_datasets()
# gsm8k: 3 files
# tiny-shakespeare: 1 files
k.attach_dataset("openai/gsm8k")
# Dataset mounted: /kaggle/input/gsm8k (3 files)
# train.jsonl: 12.4 MB
# test.jsonl: 1.2 MB
Profiles
Save kernel configurations for reuse across sessions:
k.save_profile("gpu-training")
# Later, in a new script:
k = Kernel.from_profile("gpu-training")
Audit Log
Every action is logged for debugging:
from kgz.audit import print_history
print_history()
# 2026-04-04 10:23 my-session execute
# 2026-04-04 10:24 my-session execute_cached
# 2026-04-04 10:25 my-session interrupt