Implement SQL Lint with Bazel and SQLFluff

Jack Zhai
3 min readApr 17, 2023

Background

After SQL version control, SQL lint step in CI/CD pipeline is required. The benefit of it is that we are able to find out the issues from SQL before we execute it.

In this blog, we’ll combine Bazel and SQLFluff[1] to implement linting SQL.

SQLFluff is a SQL lint tool written in Python and supported a lot of SQL dialects.

The features it have:

  1. 1. Supporting a lot of SQL dialects, like Snowflake、PostgreSQL、ClickHouse, etc. Here is the list: https://docs.sqlfluff.com/en/stable/dialects.html .
  2. 2. Able to output the correct SQL. It can reduce our effort to fix SQL issue.
  3. 3. At the same time, it supports the use of command lines and API calls.

Put it into CI/CD Pipeline

In my opinion, there are two ways to implement SQL lint in CI/CD pipeline:

  • way 1:Add a SQL lint step into pipeline;
  • way 2: Put the SQL lint logic into the testing code. When you run test command, it run automatically.

Way 2 is my favorite. I will explain it at the end of this blog.

Project Structure

.
├── BUILD.bazel
├── WORKSPACE
├── repository-hibernate-impl
│ ├── BUILD.bazel
│ └── src
│ ├── main
│ │ └── sql
│ │ └── V1__runbook_table.sql
│ └── test
│ └── python
│ ├── BUILD.bazel
│ ├── requirements_lock.txt
│ └── sql_test.py

Step 1: Add Python Dependencies in WORKSPACE file

This article is based on Bazel 5.4.0, so we defined external dependencies in WORKSPACE file.

http_archive(  
name = "rules_python",
sha256 = "a644da969b6824cc87f8fe7b18101a8a6c57da5db39caa6566ec6109f37d2141",
strip_prefix = "rules_python-0.20.0",
url = "https://github.com/bazelbuild/rules_python/releases/download/0.20.0/rules_python-0.20.0.tar.gz",
)

load("@rules_python//python:repositories.bzl", "py_repositories")


py_repositories()

load("@rules_python//python:repositories.bzl", "python_register_toolchains")

python_register_toolchains(
name = "python3_11",
python_version = "3.11",
)

load("@python3_11//:defs.bzl", interpreter_3_11 = "interpreter")

load("@rules_python//python:pip.bzl", "pip_parse")

# Create a central repo that knows about the dependencies needed from
# requirements_lock.txt.
pip_parse(
name = "pip_deps",
python_interpreter_target = interpreter_3_11,
requirements_lock = "//repository-hibernate-impl/src/test/python:requirements_lock.txt",
)
# Load the starlark macro which will define your dependencies.
load("@pip_deps//:requirements.bzl", "install_deps")
# Call it to define repos for your requirements.
install_deps()

Step 2: Define SQLFluff Dependencies

The content of requirements_lock.txt file:

sqlfluff==2.0.5  
Jinja2==3.1.2
MarkupSafe==2.1.2
Pygments==2.15.0
appdirs==1.4.4
chardet==5.1.0
click==8.1.3
colorama==0.4.6
diff_cover==7.5.0
iniconfig==2.0.0
packaging==23.1.0
pathspec==0.11.1
pluggy==1.0.0
pytest==7.3.1
tomli==2.0.1
toml==0.10.2
exceptiongroup==1.1.1
pyyaml==6.0
regex===2023.3.23
tblib==1.7.0
tqdm==4.65.0
typing_extensions==4.5.0

Step 3: Define BUILD target

load("@pip_deps//:requirements.bzl", "requirement")  
load("@rules_python//python:defs.bzl", "py_test")

py_test(
name = "sql_test",
srcs = ["sql_test.py"],
# data attribute value is the sql's location
data = [ "//repository-hibernate-impl:sqlTest",],
deps = [
requirement("sqlfluff"),
requirement("Jinja2"),
requirement("MarkupSafe"),
requirement("Pygments"),
requirement("appdirs"),
requirement("chardet"),
requirement("click"),
requirement("colorama"),
requirement("diff_cover"),
requirement("iniconfig"),
requirement("packaging"),
requirement("pathspec"),
requirement("pluggy"),
requirement("pytest"),
requirement("tomli"),
requirement("toml"),
requirement("exceptiongroup"),
requirement("pyyaml"),
requirement("regex"),
requirement("tblib"),
requirement("tqdm"),
requirement("typing_extensions"),
],
)

NOTE: The target of SQL files in repository-hibernate-impl/BUILD.bazel is as followed:

filegroup(  
name = "sqlTest",
testonly = 1,
srcs = glob(["src/main/sql/*.sql"]),
visibility = ["//visibility:public"],
)

Step 4: Run SQL Lint

All the SQL lint logics are written in a python unit test.

import unittest  
import sqlfluff
import os
import codecs

sqls_path = os.path.join(os.getcwd(), "repository-hibernate-impl/src/main/sql/")

dialect = "postgres"

class TestSum(unittest.TestCase):
def test_lint_sql(self):
sql_dir_files = os.listdir(sqls_path)
# assert that there's a sql file at least in the dir.
self.assertTrue(len(sql_dir_files) > 0)
for sql_filename in sql_dir_files:
if sql_filename.endswith(".sql"):
f = codecs.open(os.path.join(sqls_path, sql_filename), "r", "utf-8")
sql_content = f.read()
lint_result = sqlfluff.lint(sql_content, dialect=dialect)
# if there's sql issue then
if len(lint_result) > 0:
# get the fixed sql content from SQLFluff
fix_result = sqlfluff.fix(sql_content, dialect=dialect)
# output the fixed sql content
print("correct sql should be: \n" + fix_result)
self.assertEqual(len(lint_result), 0)

if __name__ == "__main__":
unittest.main()

Here is done that all the codes for SQL lint. Now we can run the command bazel test //..., in our local dev environment or in CI/CD pipeline, to lint our SQL file.

Why I choose Way 2

There’re 2 reasons:

  1. 1. Way 1 needs that developers commit the code to git to make it possible that running SQL lint. But way 2, developers are able to run SQL lint on their dev environment.
  2. 2. Way 2 implement build caching (naturally supported by Bazel), which can save a lot of build costs.

References

[1] SQLFluff: https://github.com/sqlfluff/sqlfluff

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Jack Zhai
Jack Zhai

Written by Jack Zhai

DevOps,SRE,Bazel The Author of 《Jenkins2.x In Practice》, https://showme.codes

No responses yet

Write a response