This is the multi-page printable view of this section. Click here to print.
Development
1 - Supported Platform
作为一个开源的数据库,HoraeDB 可以部署在基于英特尔 /ARM 架构的服务器,以及常见的虚拟环境。
OS | status |
---|---|
Ubuntu LTS 16.06 or later | ✅ |
CentOS 7.3 or later | ✅ |
Red Hat Enterprise Linux 7.3 or later 7.x releases | ✅ |
macOS 11 or later | ✅ |
Windows | ❌ |
- 生产环境下 , Linux 是首选平台。
- macOS 主要用在开发环境。
2 - Conventional Commit Guide
This document describes how we use conventional commit in our development.
Structure
We would like to structure our commit message like this:
<type>[optional scope]: <description>
There are three parts. type
is used to classify which kind of work this commit does. scope
is an optional field that provides additional contextual information. And the last field is your description
of this commit.
Type
Here we list some common type
s and their meanings.
feat
: Implement a new feature.fix
: Patch a bug.docs
: Add document or comment.build
: Change the build script or configuration.style
: Style change (only). No logic involved.refactor
: Refactor an existing module for performance, structure, or other reasons.test
: Enhance test coverage or sqlness.chore
: None of the above.
Scope
The scope
is more flexible than type
. And it may have different values under different type
s.
For example, In a feat
or build
commit we may use the code module to define scope, like
feat(cluster):
feat(server):
build(ci):
build(image):
And in docs
or refactor
commits the motivation is prefer to label the scope
, like
docs(comment):
docs(post):
refactor(perf):
refactor(usability):
But you don’t need to add a scope every time. This isn’t mandatory. It’s just a way to help describe the commit.
After all
There are many other rules or scenarios in conventional commit’s website. We are still exploring a better and more friendly workflow. Please do let us know by open an issue if you have any suggestions ❤️
3 - Compile
In order to compile HoraeDB, some relevant dependencies(including the Rust
toolchain) should be installed.
Dependencies
Ubuntu
Assuming the development environment is Ubuntu20.04, execute the following command to install the required dependencies:
|
|
macOS
If the development environment is MacOS, execute the following command to install the required dependencies.
- Install command line tools:
|
|
- Install protobuf:
|
|
Rust
Rust
can be installed by rustup. After installing rustup, when entering the HoraeDB project, the specified Rust
version will be automatically downloaded according to the rust-toolchain file.
After execution, you need to add environment variables to use the Rust
toolchain. Basically, just put the following commands into your ~/.bashrc
or ~/.bash_profile
:
|
|
Compile and Run
horaedb-server
Compile horaedb-server by the following command in project root directory:
cargo build
Then you can run it using the default configuration file provided in the codebase.
|
|
Tips
When compiling on macOS, you may encounter following errors:
IO error: while open a file for lock: /var/folders/jx/grdtrdms0zl3hy6zp251vjh80000gn/T/.tmpmFOAF9/manifest/LOCK: Too many open files
or
error: could not compile `regex-syntax` (lib)
warning: build failed, waiting for other jobs to finish...
LLVM ERROR: IO failure on output stream: File too large
error: could not compile `syn` (lib)
To fix those, you should adjust ulimit as follows:
|
|
horaemeta-server
Building horaemeta-server require Golang version >= 1.21, please install it before compile.
Then in horaemeta
directory, execute:
|
|
Then you can run horaemeta-server like this:
|
|
4 - Profiling
CPU profiling
HoraeDB provides cpu profiling http api debug/profile/cpu
.
Example:
// 60s cpu sampling data
curl 0:5000/debug/profile/cpu/60
// Output file path.
/tmp/flamegraph_cpu.svg
Heap profiling
HoraeDB provides heap profiling http api debug/profile/heap
.
Install dependencies
sudo yum install -y jemalloc-devel ghostscript graphviz
Example:
// enable malloc prof
export MALLOC_CONF=prof:true
// run horaedb-server
./horaedb-server ....
// 60s cpu sampling data
curl -L '0:5000/debug/profile/heap/60' > /tmp/heap_profile
jeprof --show_bytes --pdf /usr/bin/horaedb-server /tmp/heap_profile > profile_heap.pdf
jeprof --show_bytes --svg /usr/bin/horaedb-server /tmp/heap_profile > profile_heap.svg
5 - Rationale and Goals
As every Rust programmer knows, the language has many powerful features, and there are often several patterns which can express the same idea. Also, as every professional programmer comes to discover, code is almost always read far more than it is written.
Thus, we choose to use a consistent set of idioms throughout our code so that it is easier to read and understand for both existing and new contributors.
Unsafe and Platform-Dependent conditional compilation
Avoid unsafe
Rust
One of the main reasons to use Rust as an implementation language is its strong memory safety
guarantees; Almost all of these guarantees are voided by the use of unsafe
. Thus, unless there is
an excellent reason and the use is discussed beforehand, it is unlikely HoraeDB will accept patches
with unsafe
code.
We may consider taking unsafe code given:
- performance benchmarks showing a very compelling improvement
- a compelling explanation of why the same performance can not be achieved using
safe
code - tests showing how it works safely across threads
Avoid platform-specific conditional compilation cfg
We hope that HoraeDB is usable across many different platforms and Operating systems, which means we put a high value on standard Rust.
While some performance critical code may require architecture specific instructions, (e.g.
AVX512
) most of the code should not.
Errors
All errors should follow the SNAFU crate philosophy and use SNAFU functionality
Good:
- Derives
Snafu
andDebug
functionality - Has a useful, end-user-friendly display message
|
|
Bad:
|
|
Use the ensure!
macro to check a condition and return an error
Good:
- Reads more like an
assert!
- Is more concise
|
|
Bad:
|
|
Errors should be defined in the module they are instantiated
Good:
- Groups related error conditions together most closely with the code that produces them
- Reduces the need to
match
on unrelated errors that would never happen
|
|
Bad:
|
|
The Result
type alias should be defined in each module
Good:
- Reduces repetition
|
|
Bad:
|
|
Err
variants should be returned with fail()
Good:
|
|
Bad:
|
|
Use context
to wrap underlying errors into module specific errors
Good:
- Reduces boilerplate
|
|
Bad:
|
|
Hint for Box<dyn::std::error::Error>
in Snafu:
If your error contains a trait object (e.g. Box<dyn std::error::Error + Send + Sync>
), in order
to use context()
you need to wrap the error in a Box
, we provide a box_err
function to help do this conversion:
|
|
Each error cause in a module should have a distinct Error
enum variant
Specific error types are preferred over a generic error with a message
or kind
field.
Good:
- Makes it easier to track down the offending code based on a specific failure
- Reduces the size of the error enum (
String
is 3x 64-bit vs no space) - Makes it easier to remove vestigial errors
- Is more concise
|
|
Bad:
|
|
Leaf error should contains backtrace
In order to make debugging easier, leaf errors in error chain should contains a backtrace.
|
|
Tests
Don’t return Result
from test functions
At the time of this writing, if you return Result
from test functions to use ?
in the test
function body and an Err
value is returned, the test failure message is not particularly helpful.
Therefore, prefer not having a return type for test functions and instead using expect
or
unwrap
in test function bodies.
Good:
|
|
Bad:
|
|
Thanks
Initial version of this doc is forked from influxdb_iox, thanks for their hard work.
6 - RoadMap
v0.1.0
- Standalone version, local storage
- Analytical storage format
- Support SQL
v0.2.0
- Distributed version supports static topology defined in config file.
- The underlying storage supports Aliyun OSS.
- WAL implementation based on OBKV.
v0.3.0
- Release multi-language clients, including Java, Rust and Python.
- Static cluster mode with
HoraeMeta
. - Basic implementation of hybrid storage format.
v0.4.0
- Implement more sophisticated cluster solution that enhances reliability and scalability of HoraeDB.
- Set up nightly benchmark with TSBS.
v1.0.0-alpha (Released)
- Implement Distributed WAL based on
Apache Kafka
. - Release Golang client.
- Improve the query performance for classic time series workloads.
- Support dynamic migration of tables in cluster mode.
v1.0.0
- Formally release HoraeDB and its SDKs with all breaking changes finished.
- Finish the majority of work related to
Table Partitioning
. - Various efforts to improve query performance, especially for cloud-native cluster mode. These works include:
- Multi-tier cache.
- Introduce various methods to reduce the data fetched from remote storage (improve the accuracy of SST data filtering).
- Increase the parallelism while fetching data from remote object-store.
- Improve data ingestion performance by introducing resource control over compaction.
Afterwards
With an in-depth understanding of the time-series database and its various use cases, the majority of our work will focus on performance, reliability, scalability, ease of use, and collaborations with open-source communities.
- Add utilities that support
PromQL
,InfluxQL
,OpenTSDB
protocol, and so on. - Provide basic utilities for operation and maintenance. Specifically, the following are included:
- Deployment tools that fit well for cloud infrastructures like
Kubernetes
. - Enhance self-observability, especially critical logs and metrics should be supplemented.
- Deployment tools that fit well for cloud infrastructures like
- Develop various tools that ease the use of HoraeDB. For example, data import and export tools.
- Explore new storage formats that will improve performance on hybrid workloads (analytical and time-series workloads).
7 - SDK Development
Rust
|
|
Python
Requirements
- python 3.7+
The Python SDK rely on Rust SDK, so cargo is also required, then install build tool maturin:
|
|
Then we can build Python SDK:
|
|
Go
|
|
Java
Requirements
- java 1.8
- maven 3.6.3+
|
|