Continued work on column-level lineage
We added DDL retrieval for the Snowflake data warehouse to enhance code navigation. Our parser now correctly recognises SAMPLE | TABLESAMPLE
queries and CLUSTER BY
in table creation. We also made many changes to handle DWH-specific edge cases, such as fully quoted identifiers in BigQuery, SQL functions with the same name but flipped arguments, and some generally reserved keywords being accepted by Snowflake or specific rules of lateral column handling.
We added support for PIVOT
and UNPIVOT
operations, which now correctly track multiple aggregations (in DWHs supporting it). We also added support for JOINs to subqueries.
As we get closer to our planned release, we are focusing on ensuring the quality of the produced lineage results. To achieve this, we extended our test suite to cover more exotic examples of SQL syntax and added automatic coverage report generation. This allows us to track parsing/analysis completeness and prevent future regressions. With the help of Synq monitors, we track changes to the syntax coverage.
This week, we also started work to enhance our analysis with information about known tables and their known columns. This will significantly improve the handling of SELECT * FROM foo UNION ALL SELECT * FROM bar
or other cases when SQL wildcard is used.