What is SQLFlow?
SQLFlow is a bridge that connects a SQL engine, e.g. MySQL, Hive, SparkSQL or SQL Server, with TensorFlow and other machine learning toolkits. SQLFlow extends the SQL language to enable model training, prediction and inference.
The current experience of development ML based applications requires a team of data engineers, data scientists, business analysts as well as a proliferation of advanced langauges and programming tools like Python, SQL, SAS, SASS, Julia, R. The fragmentation of tooling and development environment brings additional difficulties in engineering to model trainning/tunning. What if we marry the most widely used data management/processing language SQL with ML/system capabilities and let engineers with SQL skills develop advanced ML based applcations?
There are already some work in progress in the industry. We can write simple machine learning prediction (or scoring) algorithms in SQL using operators like
DOT_PRODUCT. However, this requires copy-n-pasting model parameters from the training program to SQL statements. In the commercial world, we see some proprietary SQL engines providing extensions to support machine learning capabilities.
- Microsoft SQL Server: Microsoft SQL Server has the machine learning service that runs machine learning programs in R or Python as an external script.
- Teradata SQL for DL: Teradata also provides a RESTful service, which is callable from the extended SQL SELECT syntax.
- Google BigQuery: Google BigQuery enables machine learning in SQL by introducing the
None of the existing solution solves our pain point, instead we want it to be fully extensible.
- This solution should be compatible to many SQL engines, instead of a specific version or type.
- It should support sophisticated machine learning models, including TensorFlow for deep learning and xgboost for trees.
- We also want the flexibility to configure and run cutting-edge ML algorithms including specifying feature crosses, at least, no Python or R code embedded in the SQL statements, and fully integrated with hyperparameter estimation.
How to use SQLFlow
Your feedback is our motivation to move on. Please let us know your questions, concerns, and issues by filing Github Issues.