This is the multi-page printable view of this section. Click here to print.
SQL Syntax
- 1: Data Model
- 1.1: Data Types
- 1.2: Special Columns
- 2: Identifier
- 3: Data Definition Statements
- 3.1: ALTER TABLE
- 3.2: CREATE TABLE
- 3.3: DROP TABLE
- 4: Data Manipulation Statements
- 5: Utility Statements
- 6: Options
- 7: Scalar Functions
- 8: Aggregate Functions
1 - Data Model
This chapter introduces the data model of HoraeDB.
1.1 - Data Types
HoraeDB implements table model, and the supported data types are similar to MySQL. The following table lists the mapping relationship between MySQL and HoraeDB.
Support Data Type(case-insensitive)
SQL | HoraeDB |
---|---|
null | Null |
timestamp | Timestamp |
double | Double |
float | Float |
string | String |
Varbinary | Varbinary |
uint64 | UInt64 |
uint32 | UInt32 |
uint16 | UInt16 |
uint8 | UInt8 |
int64/bigint | Int64 |
int32/int | Int32 |
int16/smallint | Int16 |
int8/tinyint | Int8 |
boolean | Boolean |
date | Date |
time | Time |
1.2 - Special Columns
Tables in HoraeDB have the following constraints:
- Primary key is required
- The primary key must contain a time column, and can only contain one time column
- The primary key must be non-null, so all columns in primary key must be non-null.
Timestamp Column
Tables in HoraeDB must have one timestamp column maps to timestamp in timeseries data, such as timestamp in OpenTSDB/Prometheus.
The timestamp column can be set with timestamp key
keyword, like TIMESTAMP KEY(ts)
.
Tag Column
Tag
is use to defined column as tag column, similar to tag in timeseries data, such as tag in OpenTSDB and label in Prometheus.
Primary Key
The primary key is used for data deduplication and sorting. The primary key is composed of some columns and one time column. The primary key can be set in the following some ways:
- use
primary key
keyword - use
tag
to auto generate TSID, HoraeDB will use(TSID,timestamp)
as primary key - only set Timestamp column, HoraeDB will use
(timestamp)
as primary key
Notice: If the primary key and tag are specified at the same time, then the tag column is just an additional information identification and will not affect the logic.
|
|
TSID
If primary key
is not set, and tag columns is provided, TSID will auto generated from hash of tag columns.
In essence, this is also a mechanism for automatically generating id.
2 - Identifier
Identifier in HoraeDB can be used as table name, column name etc. It cannot be preserved keywords or start with number and punctuation symbols. HoraeDB allows to quote identifiers with back quotes (`). In this case it can be any string like 00_table
or select
.
3 - Data Definition Statements
This chapter introduces the data definition statements.
3.1 - ALTER TABLE
ALTER TABLE
can change the schema or options of a table.
ALTER TABLE SCHEMA
HoraeDB current supports ADD COLUMN
to alter table schema.
|
|
It now becomes:
-- DESCRIBE TABLE `t`;
name type is_primary is_nullable is_tag
t timestamp true false false
tsid uint64 true false false
a int false true false
b string false true false
ALTER TABLE OPTIONS
HoraeDB current supports MODIFY SETTING
to alter table schema.
|
|
The SQL above tries to modify the write_buffer_size
of the table, and the table’s option becomes:
|
|
Besides, the ttl
can be altered from 7 days to 10 days by such SQL:
|
|
3.2 - CREATE TABLE
Basic syntax
Basic syntax:
|
|
Column definition syntax:
|
|
Partition options syntax:
|
|
Table options syntax are key-value pairs. Value should be quoted with quotation marks ('
). E.g.:
|
|
IF NOT EXISTS
Add IF NOT EXISTS
to tell HoraeDB to ignore errors if the table name already exists.
Define Column
A column’s definition should at least contains the name and type parts. All supported types are listed here.
Column is default be nullable. i.e. NULL
keyword is implied. Adding NOT NULL
constrains to make it required.
|
|
A column can be marked as special column with related keyword.
For string tag column, we recommend to define it as dictionary to reduce memory consumption:
|
|
Engine
Specifies which engine this table belongs to. HoraeDB current support Analytic
engine type. This attribute is immutable.
Partition Options
Note: This feature is only supported in distributed version.
|
|
Example below creates a table with 8 partitions, and partitioned by name
:
|
|
3.3 - DROP TABLE
Basic syntax
Basic syntax:
|
|
Drop Table
removes a specific table. This statement should be used with caution, because it removes both the table definition and table data, and this removal is not recoverable.
4 - Data Manipulation Statements
This chapter introduces the data manipulation statements.
4.1 - INSERT
Basic syntax
Basic syntax:
|
|
INSERT
inserts new rows into a HoraeDB table. Here is an example:
|
|
4.2 - SELECT
Basic syntax
Basic syntax (parts between []
are optional):
|
|
Select
syntax in HoraeDB is similar to mysql, here is an example:
|
|
5 - Utility Statements
There are serval utilities SQL in HoraeDB that can help in table manipulation or query inspection.
SHOW CREATE TABLE
|
|
SHOW CREATE TABLE
returns a CREATE TABLE
DDL that will create a same table with the given one. Including columns, table engine and options. The schema and options shows in CREATE TABLE
will based on the current version of the table. An example:
|
|
DESCRIBE
|
|
DESCRIBE
will show a detailed schema of one table. The attributes include column name and type, whether it is tag and primary key (todo: ref) and whether it’s nullable. The auto created column tsid
will also be included (todo: ref).
Example:
|
|
The result is:
name type is_primary is_nullable is_tag
t timestamp true false false
tsid uint64 true false false
a int false true false
b string false true false
EXPLAIN
|
|
EXPLAIN
shows how a query will be executed. Add it to the beginning of a query like
|
|
will give
logical_plan
Projection: #MAX(07_optimizer_t.value) AS c1, #AVG(07_optimizer_t.value) AS c2
Aggregate: groupBy=[[#07_optimizer_t.name]], aggr=[[MAX(#07_optimizer_t.value), AVG(#07_optimizer_t.value)]]
TableScan: 07_optimizer_t projection=Some([name, value])
physical_plan
ProjectionExec: expr=[MAX(07_optimizer_t.value)@1 as c1, AVG(07_optimizer_t.value)@2 as c2]
AggregateExec: mode=FinalPartitioned, gby=[name@0 as name], aggr=[MAX(07_optimizer_t.value), AVG(07_optimizer_t.value)]
CoalesceBatchesExec: target_batch_size=4096
RepartitionExec: partitioning=Hash([Column { name: \"name\", index: 0 }], 6)
AggregateExec: mode=Partial, gby=[name@0 as name], aggr=[MAX(07_optimizer_t.value), AVG(07_optimizer_t.value)]
ScanTable: table=07_optimizer_t, parallelism=8, order=None
6 - Options
Options below can be used when create table for analytic engine
enable_ttl
,bool
. When enable TTL on a table, rows older thanttl
will be deleted and can’t be querid, defaulttrue
ttl
,duration
, lifetime of a row, only used whenenable_ttl
istrue
. default7d
.storage_format
,string
. The underlying column’s format. Availiable values:columnar
, defaulthybrid
, Note: This feature is still in development, and it may change in the future.
The meaning of those two values are in Storage format section.
Storage Format
There are mainly two formats supported in analytic engine. One is columnar
, which is the traditional columnar format, with one table column in one physical column:
|
|
The other one is hybrid
, an experimental format used to simulate row-oriented storage in columnar storage to accelerate classic time-series query.
In classic time-series user cases like IoT or DevOps, queries will typically first group their result by series id(or device id), then by timestamp. In order to achieve good performance in those scenarios, the data physical layout should match this style, so the hybrid
format is proposed like this:
|
|
- Within one file, rows belonging to the same primary key(eg: series/device id) are collapsed into one row
- The columns besides primary key are divided into two categories:
collapsible
, those columns will be collapsed into a list. Used to encodefields
in time-series table- Note: only fixed-length type is supported now
non-collapsible
, those columns should only contain one distinct value. Used to encodetags
in time-series table- Note: only string type is supported now
- Two more columns are added,
minTime
andmaxTime
. Those are used to cut unnecessary rows out in query.- Note: Not implemented yet.
Example
|
|
This will create a table with hybrid format, users can inspect data format with parquet-tools. The table above should have following parquet schema:
message arrow_schema {
optional group ts (LIST) {
repeated group list {
optional int64 item (TIMESTAMP(MILLIS,false));
}
}
required int64 tsid (INTEGER(64,false));
optional binary tag1 (STRING);
optional binary tag2 (STRING);
optional group value1 (LIST) {
repeated group list {
optional double item;
}
}
optional group value2 (LIST) {
repeated group list {
optional int32 item;
}
}
}
7 - Scalar Functions
HoraeDB SQL is implemented with DataFusion, Here is the list of scalar functions. See more detail, Refer to Datafusion
Math Functions
Function | Description |
---|---|
abs(x) | absolute value |
acos(x) | inverse cosine |
asin(x) | inverse sine |
atan(x) | inverse tangent |
atan2(y, x) | inverse tangent of y / x |
ceil(x) | nearest integer greater than or equal to argument |
cos(x) | cosine |
exp(x) | exponential |
floor(x) | nearest integer less than or equal to argument |
ln(x) | natural logarithm |
log10(x) | base 10 logarithm |
log2(x) | base 2 logarithm |
power(base, exponent) | base raised to the power of exponent |
round(x) | round to nearest integer |
signum(x) | sign of the argument (-1, 0, +1) |
sin(x) | sine |
sqrt(x) | square root |
tan(x) | tangent |
trunc(x) | truncate toward zero |
Conditional Functions
Function | Description |
---|---|
coalesce | Returns the first of its arguments that is not null. Null is returned only if all arguments are null. It is often used to substitute a default value for null values when data is retrieved for display. |
nullif | Returns a null value if value1 equals value2; otherwise it returns value1. This can be used to perform the inverse operation of the coalesce expression. |
String Functions
Function | Description |
---|---|
ascii | Returns the numeric code of the first character of the argument. In UTF8 encoding, returns the Unicode code point of the character. In other multibyte encodings, the argument must be an ASCII character. |
bit_length | Returns the number of bits in a character string expression. |
btrim | Removes the longest string containing any of the specified characters from the start and end of string. |
char_length | Equivalent to length. |
character_length | Equivalent to length. |
concat | Concatenates two or more strings into one string. |
concat_ws | Combines two values with a given separator. |
chr | Returns the character based on the number code. |
initcap | Capitalizes the first letter of each word in a string. |
left | Returns the specified leftmost characters of a string. |
length | Returns the number of characters in a string. |
lower | Converts all characters in a string to their lower case equivalent. |
lpad | Left-pads a string to a given length with a specific set of characters. |
ltrim | Removes the longest string containing any of the characters in characters from the start of string. |
md5 | Calculates the MD5 hash of a given string. |
octet_length | Equivalent to length. |
repeat | Returns a string consisting of the input string repeated a specified number of times. |
replace | Replaces all occurrences in a string of a substring with a new substring. |
reverse | Reverses a string. |
right | Returns the specified rightmost characters of a string. |
rpad | Right-pads a string to a given length with a specific set of characters. |
rtrim | Removes the longest string containing any of the characters in characters from the end of string. |
digest | Calculates the hash of a given string. |
split_part | Splits a string on a specified delimiter and returns the specified field from the resulting array. |
starts_with | Checks whether a string starts with a particular substring. |
strpos | Searches a string for a specific substring and returns its position. |
substr | Extracts a substring of a string. |
translate | Translates one set of characters into another. |
trim | Removes the longest string containing any of the characters in characters from either the start or end of string. |
upper | Converts all characters in a string to their upper case equivalent. |
Regular Expression Functions
Function | Description |
---|---|
regexp_match | Determines whether a string matches a regular expression pattern. |
regexp_replace | Replaces all occurrences in a string of a substring that matches a regular expression pattern with a new substring. |
Temporal Functions
Function | Description |
---|---|
to_timestamp | Converts a string to type Timestamp(Nanoseconds, None). |
to_timestamp_millis | Converts a string to type Timestamp(Milliseconds, None). |
to_timestamp_micros | Converts a string to type Timestamp(Microseconds, None). |
to_timestamp_seconds | Converts a string to type Timestamp(Seconds, None). |
extract | Retrieves subfields such as year or hour from date/time values. |
date_part | Retrieves subfield from date/time values. |
date_trunc | Truncates date/time values to specified precision. |
date_bin | Bin date/time values to specified precision. |
from_unixtime | Converts Unix epoch to type Timestamp(Nanoseconds, None). |
now | Returns current time as Timestamp(Nanoseconds, UTC). |
Other Functions
Function | Description |
---|---|
array | Create an array. |
arrow_typeof | Returns underlying type. |
in_list | Check if value in list. |
random | Generate random value. |
sha224 | sha224 |
sha256 | sha256 |
sha384 | sha384 |
sha512 | sha512 |
to_hex | Convert to hex. |
8 - Aggregate Functions
HoraeDB SQL is implemented with DataFusion, Here is the list of aggregate functions. See more detail, Refer to Datafusion
General
Function | Description |
---|---|
min | Returns the minimum value in a numerical column |
max | Returns the maximum value in a numerical column |
count | Returns the number of rows |
avg | Returns the average of a numerical column |
sum | Sums a numerical column |
array_agg | Puts values into an array |
Statistical
Function | Description |
---|---|
var / var_samp | Returns the variance of a given column |
var_pop | Returns the population variance of a given column |
stddev / stddev_samp | Returns the standard deviation of a given column |
stddev_pop | Returns the population standard deviation of a given column |
covar / covar_samp | Returns the covariance of a given column |
covar_pop | Returns the population covariance of a given column |
corr | Returns the correlation coefficient of a given column |
Approximate
Function | Description |
---|---|
approx_distinct | Returns the approximate number (HyperLogLog) of distinct input values |
approx_median | Returns the approximate median of input values. It is an alias of approx_percentile_cont(x, 0.5). |
approx_percentile_cont | Returns the approximate percentile (TDigest) of input values, where p is a float64 between 0 and 1 (inclusive). It supports raw data as input and build Tdigest sketches during query time, and is approximately equal to approx_percentile_cont_with_weight(x, 1, p). |
approx_percentile_cont_with_weight | Returns the approximate percentile (TDigest) of input values with weight, where w is weight column expression and p is a float64 between 0 and 1 (inclusive). It supports raw data as input or pre-aggregated TDigest sketches, then builds or merges Tdigest sketches during query time. TDigest sketches are a list of centroid (x, w), where x stands for mean and w stands for weight. |