6 Lessons for Life

The finest teacher I’ve ever known was born today. Aaron turns only 6, but I’ve learned more from him in these years than I did in the three decades prior to his arrival. When we found out what he…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Use Chaos to test the distributed system linearizability

Prelude

Using Porcupine is easy. The Porcupine has two similar verify functions, CheckOperations and CheckEvents. I usually use Event, so I will use CheckEvents for the explanation later.

The definition of Event is as below:

Assume we build a simple Key-Value store, only operating for one key. The definition of request and response is as follows:

So, a event history may look like:

At last, we will define a check model for our case and use CheckEvents to check the event list to know whether the history is linearizability or not.

The definition of Model is as below:

We ignore some irrelevant field functions, only focus two functions:Init and Step.

For our Key-Value store, Step may be as below:

For read request, we can return true only when we read the same value with last state. For write request, any OK write will return the new state.

So for the previous event history, it is obviously linearizable. But if we have following event history:

The history is not linearizable, because before read operation of ID 3, the write operation of ID 2 has already written a new value.

All above examples are easy, but in the real distributed environment, we may have three status for one operation: success, failure or timeout.

In the above example, we can add an Unknown field in the response. For an unknown response, it should return true. So for write, it is return out.Ok || out.Unknown, inp.Value.

We will add the unknown response to the end of the event history, like:

If we don’t do it, Porcupine will think the history is not linearizable.

I talk about how to use Porcupine above, now I will introduce the linearizability testing framework: Chaos.

Similar to Jepsen, Chaos also runs DB service on 5 nodes. We name node from n1 to n5, and can connect the node with the name directly, E.g., we can use ssh n1 to login to node n1.

Chaos has a control node named controller, to manage whole cluster, including initializing the testing DB, creating the associated client to run the real test, starting nemesis to disturb the cluster and verifying whether the event history is linearizable or not.

The architecture is as below:

Unlike Jepsen which uses SSH to execute remote command on the node, Chaos runs an agent on every node and let controller communicate with agent through HTTP API to execute remote command on the node. The reason is I just want to use Go functions to implement the DB and nemesis, not through Linux command like Jepsen.

But we have to start the agent on the node at first, this may cause a little un-convenient for use. Luckily, we can use scripts to make it easy.

Because Go is a static language, we need to implement the interfaces and register to Chaos if we want to test our own DB linearizability. I will use TiDB for example.

If we want to test our own DB, we should first implement the DB interface:

We will use SetUp to set up the DB cluster and use TeerDown to tear down the cluster. The name of other functions are obvious so I ignore explaining them. Name returns the name of the DB, the DB must be unique because we need to use the name to register the DB to Chaos. E.g., we use “tide” here.

The argument nodes contains all nodes, usually it is ["n1", ... "n5"].node is the name of the node which the DB runs on.

In TiDB, we will download the TiDB binary, archive the binary to the specified directory, update the configuration and then start the TiDB cluster in SetUp. We will send kill to kill the cluster in TearDown.

After we implements our own DB, we will use RegisterDB to register the DB, and then we can use the DB name to get the DB instance in the agent.

Client is used to communicate with the DB on the controller. The definition of client interface is as below:

The client uses Invoke to send command to the DB, the argument for the command request and the response of the command are all an interface type because we don’t know the real type here.

The NextRequest returns the next request to be invoked, and only the client can know how to construct the request.

In TiDB bank test case, we define a bank client, and in the NextRequest, we will choose reading all accounts’ salaries or transferring some money from one account to another.

If the NextRequest returns a read request, the response is the read salaries. If returns a transfer request, the response is OK or not. We also need to consider the unknown status in the response.

The controller will create an independent client for each node, so Chaos defines a client creator interface to create multiply clients, see:

For every client invoke, the controller will record the request and the response to a event history file. The request and response event belong to one operation.

For simplicity, we use JSON format to encode the request and response and write them to the history file line by line. After the test finished, we need to implement a record parser to parse the history file and to verify the history linearizability.

When the parser reads one record, it will first detect the record is request or response, then call the corresponding function to decode.

Please notice the OnNoopResponse here. If we meet an unknown response, we need to return nil in the OnResponse function to let Chaos ignore this event and use OnNoopResposne to generate a response to complete the previous operation.

We also need to define our own Porcupine Model and use VerifyHistory(historyFile string, m porcupine.Model, p RecordParser) to verify the history.

In the TiDB bank test case, the most important Step function of the Porcupine Model is defined as follows:

For read operation, we just check whether the balances is the same with last state’s. For transfer operation, we need to do a transfer based on the last state and return the new state.

when the controller executes the tests, it can also run some nemeses to disturb the cluster, E.g., killing all DB or dropping network packages from one node.

The definition of the nemesis interface is as below:

The name of the nemesis must be unique because it must be registered to Chaos.

We useInvoke to disturb the cluster and use Recover to do the recovery. We also need to use RegisterNemesis to register our own nemesis, so that we can use it through name in the agent.

On the controller side, we need to implement a NemesisGenerator:

The Generate function will generate a nemesis operation for each node. The nemesis operation contains the name and the arguments of the nemesis, and the running time. The controller will send the nemesis operation to the agent and let the agent to execute the associated nemesis.

After we define our own DB, client, and nemesis, we need to combine them all. We need to register DB and nemesis in the agent, E.g, for TiDB in thecmd/agent/main.go file, the register code is:

Then we start the agent and use NewController to create a controller.

The controller needs a ClientCreator and a list of nemesis generator. The configuration contains the DB name, the total number of the request and the total running time, etc.

Then we can start the controller, run the test and verify the history file.

Chaos is in the very early stage now, and needs to be improved gradually. Now I only supply a TiDB bank test case and will introduce more test cases for TiDB. It is very appreciated that you can help me add more test cases.

Add a comment

Related posts:

Lessons From a Small Boy in a Wheelchair

I am sitting in the early winter sunlight, a persistent warmth hanging inside a light, cool breeze. Beside me is a six year-old boy, his body within the confines of a wheelchair. Except his left…

The Big Brother dollar

No matter how much you save. Payments and costly dollar signs will never stop chasing you. You can feel comfortable, behind, or even well off and still hate the sight of every new invoice, statement…

Buy marijuana online canada

Marijuana is also known as pot, grass and weed but its formal name is obviously cannabis. It originates from the leaves and flowers of the plant Cannabis sativa. It is recognized as an illegal…