Implement SQL Lint with Bazel and SQLFluff

3 min readApr 17, 2023

Background

After SQL version control, SQL lint step in CI/CD pipeline is required. The benefit of it is that we are able to find out the issues from SQL before we execute it.

In this blog, we’ll combine Bazel and SQLFluff[1] to implement linting SQL.

SQLFluff is a SQL lint tool written in Python and supported a lot of SQL dialects.

The features it have:

1. Supporting a lot of SQL dialects, like Snowflake、PostgreSQL、ClickHouse, etc. Here is the list: https://docs.sqlfluff.com/en/stable/dialects.html .
2. Able to output the correct SQL. It can reduce our effort to fix SQL issue.
3. At the same time, it supports the use of command lines and API calls.

Put it into CI/CD Pipeline

In my opinion, there are two ways to implement SQL lint in CI/CD pipeline:

way 1：Add a SQL lint step into pipeline;
way 2: Put the SQL lint logic into the testing code. When you run test command, it run automatically.

Way 2 is my favorite. I will explain it at the end of this blog.

Project Structure

.
├── BUILD.bazel
├── WORKSPACE
├── repository-hibernate-impl
│   ├── BUILD.bazel
│   └── src
│       ├── main
│       │   └── sql
│       │       └── V1__runbook_table.sql
│       └── test
│           └── python
│               ├── BUILD.bazel
│               ├── requirements_lock.txt
│               └── sql_test.py

Step 1: Add Python Dependencies in WORKSPACE file

This article is based on Bazel 5.4.0, so we defined external dependencies in WORKSPACE file.

http_archive(  
    name = "rules_python",  
    sha256 = "a644da969b6824cc87f8fe7b18101a8a6c57da5db39caa6566ec6109f37d2141",  
    strip_prefix = "rules_python-0.20.0",  
    url = "https://github.com/bazelbuild/rules_python/releases/download/0.20.0/rules_python-0.20.0.tar.gz",  
)  
  
load("@rules_python//python:repositories.bzl", "py_repositories")  
  
  
py_repositories()  
  
load("@rules_python//python:repositories.bzl", "python_register_toolchains")  
  
python_register_toolchains(  
    name = "python3_11",  
    python_version = "3.11",  
)  
  
load("@python3_11//:defs.bzl", interpreter_3_11 = "interpreter")  
  
load("@rules_python//python:pip.bzl", "pip_parse")  
  
# Create a central repo that knows about the dependencies needed from  
# requirements_lock.txt.  
pip_parse(  
   name = "pip_deps",  
   python_interpreter_target = interpreter_3_11,  
   requirements_lock = "//repository-hibernate-impl/src/test/python:requirements_lock.txt",  
)  
# Load the starlark macro which will define your dependencies.  
load("@pip_deps//:requirements.bzl", "install_deps")  
# Call it to define repos for your requirements.  
install_deps()

Step 2: Define SQLFluff Dependencies

The content of requirements_lock.txt file:

sqlfluff==2.0.5  
Jinja2==3.1.2  
MarkupSafe==2.1.2  
Pygments==2.15.0  
appdirs==1.4.4  
chardet==5.1.0  
click==8.1.3  
colorama==0.4.6  
diff_cover==7.5.0  
iniconfig==2.0.0  
packaging==23.1.0  
pathspec==0.11.1  
pluggy==1.0.0  
pytest==7.3.1  
tomli==2.0.1  
toml==0.10.2  
exceptiongroup==1.1.1  
pyyaml==6.0  
regex===2023.3.23  
tblib==1.7.0  
tqdm==4.65.0  
typing_extensions==4.5.0

Step 3: Define BUILD target

load("@pip_deps//:requirements.bzl", "requirement")  
load("@rules_python//python:defs.bzl", "py_test")  
  
py_test(  
    name = "sql_test",  
    srcs = ["sql_test.py"],  
    # data attribute value is the sql's location
    data = [ "//repository-hibernate-impl:sqlTest",],  
    deps = [  
       requirement("sqlfluff"),  
       requirement("Jinja2"),  
       requirement("MarkupSafe"),  
       requirement("Pygments"),  
       requirement("appdirs"),  
       requirement("chardet"),  
       requirement("click"),  
       requirement("colorama"),  
       requirement("diff_cover"),  
       requirement("iniconfig"),  
       requirement("packaging"),  
       requirement("pathspec"),  
       requirement("pluggy"),  
       requirement("pytest"),  
       requirement("tomli"),  
       requirement("toml"),  
       requirement("exceptiongroup"),  
       requirement("pyyaml"),  
       requirement("regex"),  
       requirement("tblib"),  
       requirement("tqdm"),  
       requirement("typing_extensions"),  
    ],  
)

NOTE: The target of SQL files in repository-hibernate-impl/BUILD.bazel is as followed:

filegroup(  
    name = "sqlTest",  
    testonly = 1,  
    srcs = glob(["src/main/sql/*.sql"]),  
    visibility = ["//visibility:public"],  
)

Step 4: Run SQL Lint

All the SQL lint logics are written in a python unit test.

import unittest  
import sqlfluff  
import os  
import codecs  
  
sqls_path = os.path.join(os.getcwd(), "repository-hibernate-impl/src/main/sql/")  
  
dialect = "postgres"  
  
class TestSum(unittest.TestCase):  
    def test_lint_sql(self):  
        sql_dir_files = os.listdir(sqls_path)  
        # assert that there's a sql file at least in the dir. 
        self.assertTrue(len(sql_dir_files) > 0)  
        for sql_filename in sql_dir_files:  
            if sql_filename.endswith(".sql"):  
                f = codecs.open(os.path.join(sqls_path, sql_filename), "r", "utf-8")  
                sql_content = f.read()  
                lint_result = sqlfluff.lint(sql_content, dialect=dialect)  
                # if there's sql issue then
                if len(lint_result) > 0:  
                    # get the fixed sql content from SQLFluff
                    fix_result = sqlfluff.fix(sql_content, dialect=dialect) 
                    # output the fixed sql content 
                    print("correct sql should be: \n" + fix_result)
                self.assertEqual(len(lint_result), 0)  
  
if __name__ == "__main__":  
    unittest.main()

Here is done that all the codes for SQL lint. Now we can run the command bazel test //..., in our local dev environment or in CI/CD pipeline, to lint our SQL file.

Why I choose Way 2

There’re 2 reasons:

1. Way 1 needs that developers commit the code to git to make it possible that running SQL lint. But way 2, developers are able to run SQL lint on their dev environment.
2. Way 2 implement build caching (naturally supported by Bazel), which can save a lot of build costs.