mirror of
https://github.com/talent-plan/tinykv.git
synced 2025-01-13 22:00:07 +08:00
add doc for project1
Signed-off-by: Connor1996 <zbk602423539@gmail.com>
This commit is contained in:
parent
a22b69e90a
commit
3b9800441e
@ -0,0 +1,47 @@
|
||||
# Project1 StandaloneKV
|
||||
|
||||
In this project, you will build a standalone key/value storage [gRPC](https://grpc.io/docs/guides/) service with the support of column family. Standalone means only a single node, not a distributed system. Column family( <https://en.wikipedia.org/wiki/Standard_column_family> , it will abbreviate to CF below) is a term like key namespace, namely the values of the same key in different column families is not the same. You can simply regard multiple column families as separate mini databases. It’s used to support the transaction model in project4, you will know why TinyKV needs the support of CF then.
|
||||
|
||||
The service supports four basic operations: Put/Delete/Get/Scan. It maintains a simple database of key/value pairs. Keys and values are strings. `Put` replaces the value for a particular key for the specified CF in the database, `Delete` deletes the key's value for the specified CF, `Get` fetches the current value for a key for the specified CF, and `Scan` fetches the current value for a series of keys for the specified CF.
|
||||
|
||||
The project can be broken down into 2 steps, including:
|
||||
|
||||
1. Implement a standalone storage engine.
|
||||
2. Implement raw key/value service handlers.
|
||||
|
||||
### The Code
|
||||
|
||||
The `gRPC` server is initialized in `kv/main.go` and it contains a `tinykv.Server` which provides a `gRPC` service named `TinyKv`. It was defined by [protocol-buffer]( https://developers.google.com/protocol-buffers ) in `proto/proto/tinykvpb.proto`, and the detail of rpc requests and responses are defined in `proto/proto/kvrpc.proto`.
|
||||
|
||||
Generally, you don’t need to change the proto files because all necessary fields have been defined for you. But if you still need to change, you can modify the proto file and run `make proto` to update related generated go code in `proto/pkg/xxx.pb.go`.
|
||||
|
||||
In addition, `Server` depends on a `Storage`, an interface you need to implement for the standalone storage engine located in `kv/storage/standalone_storage/standalone_storage.go`. Once the interface `Storage` is implemented in `StandaloneStorage`, you could implement the raw key/value service for the `Server` with it.
|
||||
|
||||
#### Implement standalone storage engine
|
||||
|
||||
The first mission is implementing a wrapper of [badger](https://github.com/dgraph-io/badger) key/value API. The service of gRPC server depends on an `Storage` which is defined in `kv/storage/storage.go`. In this context, the standalone storage engine is just a wrapper of badger key/value API which is provided by two methods:
|
||||
|
||||
``` go
|
||||
type Storage interface {
|
||||
// Other stuffs
|
||||
Write(ctx *kvrpcpb.Context, batch []Modify) error
|
||||
Reader(ctx *kvrpcpb.Context) (StorageReader, error)
|
||||
}
|
||||
```
|
||||
|
||||
`Write` should provide a way that applies a series of modifications to the inner state which is, in this situation, a badger instance.
|
||||
|
||||
`Reader` should return a `StorageReader` that supports key/value's point get and scan operations on a snapshot.
|
||||
|
||||
And you don’t need to consider the `kvrpcpb.Context` now, it’s used in the following projects.
|
||||
|
||||
> Hints:
|
||||
>
|
||||
> - You should use [badger.Txn]( https://godoc.org/github.com/dgraph-io/badger#Txn ) to implement the `Reader` function, because the transcation handler provided by badger could provide a consistent snapshot of the keys and values.
|
||||
> - Badger doesn’t give support for column families. engine_util package (`kv/util/engine_util`) simulates column families by adding a prefix to keys. For example, a key `key` that belongs to a specific column family `cf` is stored as `${cf}_${key}`. It warps `badger` to provide operations with CFs, and also offers many useful helper functions. So you should do all read/write operations through `engine_util` provided methods. Please read `util/engine_util/doc.go` to learn more.
|
||||
> - TinyKV uses a fork of the original version of `badger` with some fix, so just use `github.com/Connor1996/badger` instead of `github.com/dgraph-io/badger`.
|
||||
> - Don’t forget to call `Discard()` for badger.Txn and close all iterators before discarding.
|
||||
|
||||
#### Implement service handlers
|
||||
|
||||
The final step of this project is to use the implemented storage engine to build raw key/value service handlers including RawGet/ RawScan/ RawPut/ RawDelete. The handler is already defined for you, you only need to fill up the implementation in `kv/server/server.go`. Once done, remember to run `make project1` to pass the test suite.
|
@ -8,7 +8,7 @@ The project have 3 parts you need to do, including:
|
||||
|
||||
- Implement the basic Raft algorithm
|
||||
- Build a fault tolerant KV server on top of Raft
|
||||
- Add the support of raftlog GC and snapshot
|
||||
- Add the support of raftlog GC and snapshot
|
||||
|
||||
## Part A
|
||||
|
||||
@ -28,7 +28,7 @@ This part can be broken down into 3 steps, including:
|
||||
|
||||
`raft.Raft` in `raft/raft.go` provides the core of the Raft algorithm including message handling, driving the logic clock, etc. For more implementation guides, please check `raft/doc.go` which contains an overview design and what these `MessageTypes` are responsible for.
|
||||
|
||||
#### Leader election
|
||||
#### Leader election
|
||||
|
||||
To implement leader election, you may want to start with `raft.Raft.tick()` which is used to advance the internal logical clock by a single tick and hence drive the election timeout or heartbeat timeout. You don’t need to care about the message sending and receiving logic now. If you need to send out a message, just push it to `raft.Raft.msgs` and all messages the raft received will be passed to `raft.Raft.Step()`. The test code will get the messages from `raft.Raft.msgs` and pass response messages through `raft.Raft.Step()`. The `raft.Raft.Step()` is the entrance of message handling, you should handle messages like `MsgRequestVote`, `MsgHeartbeat` and their response. And please also implement test stub functions and get them called properly like `raft.Raft.becomeXXX` which used to update the raft internal state when the raft’s role changes.
|
||||
|
||||
@ -42,9 +42,9 @@ You can run `make project2ab` to test the implementation, and see some hints at
|
||||
|
||||
### Implement the raw node interface
|
||||
|
||||
`raft.RawNode` in `raft/rawnode.go` is the interface we interact with the upper application, `raft.RawNode` contains `raft.Raft` and provide some wrapper functions like `RawNode.Tick()`and `RawNode.Step()`. It also provides `RawNode.Propose()` to let the upper application propose new raft logs.
|
||||
`raft.RawNode` in `raft/rawnode.go` is the interface we interact with the upper application, `raft.RawNode` contains `raft.Raft` and provide some wrapper functions like `RawNode.Tick()`and `RawNode.Step()`. It also provides `RawNode.Propose()` to let the upper application propose new raft logs.
|
||||
|
||||
Another important struct `Ready` is also defined here. When handling messages or advancing the logical clock, the `raft.Raft` may need to interact with the upper allipcation, like:
|
||||
Another important struct `Ready` is also defined here. When handling messages or advancing the logical clock, the `raft.Raft` may need to interact with the upper application, like:
|
||||
|
||||
- send messages to other peer
|
||||
- save log entries to stable storage
|
||||
@ -57,6 +57,7 @@ But these interactions do not happen immediately, instead they are encapsulated
|
||||
You can run `make project2ac` to test the implementation and run `make project2a` to test the whole part A.
|
||||
|
||||
> Hints:
|
||||
>
|
||||
> - Add any state you need to `raft.Raft`, `raft.RaftLog`, `raft.RawNode` and message on `eraftpb.proto`
|
||||
> - The tests assume that the first time start raft should have term 0
|
||||
> - The tests assume that the new elected leader should append a noop entry on its term
|
||||
@ -68,11 +69,11 @@ You can run `make project2ac` to test the implementation and run `make project2a
|
||||
|
||||
## Part B
|
||||
|
||||
In this part you will build a fault-tolerant key-value storage service using the Raft module implemented in part A. Your key/value service will be a replicated state machine, consisting of several key/value servers that use Raft for replication. Your key/value service should continue to process client requests as long as a majority of the servers are alive and can communicate, in spite of other failures or network partitions.
|
||||
In this part you will build a fault-tolerant key-value storage service using the Raft module implemented in part A. Your key/value service will be a replicated state machine, consisting of several key/value servers that use Raft for replication. Your key/value service should continue to process client requests as long as a majority of the servers are alive and can communicate, in spite of other failures or network partitions.
|
||||
|
||||
In project1 you have implemented a standalone kv server, so you should already be familiar with the kv server API and `Storage` interface.
|
||||
|
||||
Before introducing the code, you need to understand three terms first: `Store`, `Peer` and `Region` which are defined in `proto/proto/metapb.proto`.
|
||||
Before introducing the code, you need to understand three terms first: `Store`, `Peer` and `Region` which are defined in `proto/proto/metapb.proto`.
|
||||
|
||||
- Store stands for an instance of tinykv-server
|
||||
- Peer stands for a Raft node which is running on a Store
|
||||
@ -90,8 +91,8 @@ First, the code that you should take a look at is `RaftStorage` located in `kv/
|
||||
|
||||
Then, here comes the core of TinyKV — raftstore. The structure is a little complicated, you can read the reference of TiKV to give you a better understanding of the design:
|
||||
|
||||
- https://pingcap.com/blog-cn/the-design-and-implementation-of-multi-raft/#raftstore (Chinese Version)
|
||||
- https://pingcap.com/blog/2017-08-15-multi-raft/#raftstore (English Version)
|
||||
- <https://pingcap.com/blog-cn/the-design-and-implementation-of-multi-raft/#raftstore> (Chinese Version)
|
||||
- <https://pingcap.com/blog/2017-08-15-multi-raft/#raftstore> (English Version)
|
||||
|
||||
The entrance of raftstore is `Raftstore`, see `kv/raftstore/raftstore.go`. It starts some workers to handle specific tasks asynchronously, and most of them aren’t used now so you can just ignore them. All you need to focus on is `raftWorker`.(kv/raftstore/raft_worker.go)
|
||||
|
||||
@ -101,14 +102,14 @@ The whole process is divided into two parts: raft worker polls `raftCh` to get t
|
||||
|
||||
Peer storage is what you interact with through the `Storage` interface in part A, but in addition to the raft log, peer storage also manages other persisted metadata which is very important to restore the consistent state machine after restart. Moreover, there are three important states defined in `proto/proto/raft_serverpb.proto`:
|
||||
|
||||
- RaftLocalState: Used to store HardState of the current Raft and the last Log Index.
|
||||
- RaftApplyState: Used to store the last Log index that Raft applies and some truncated Log information.
|
||||
- RegionLocalState: Used to store Region information and the corresponding Peer state on this Store. Normal indicates that this Peer is normal, Applying means this Peer hasn’t finished the apply snapshot operation and Tombstone shows that this Peer has been removed from Region and cannot join in Raft Group.
|
||||
- RaftLocalState: Used to store HardState of the current Raft and the last Log Index.
|
||||
- RaftApplyState: Used to store the last Log index that Raft applies and some truncated Log information.
|
||||
- RegionLocalState: Used to store Region information and the corresponding Peer state on this Store. Normal indicates that this Peer is normal, Applying means this Peer hasn’t finished the apply snapshot operation and Tombstone shows that this Peer has been removed from Region and cannot join in Raft Group.
|
||||
|
||||
These states are stored in two badger instances: raftdb and kvdb:
|
||||
|
||||
- raftdb stores raft log and `RaftLocalState`
|
||||
- kvdb stores key-value data in different column families, `RegionLocalState` and `RaftApplyState`. You can regard kvdb as the state machine mentioned in Raft paper
|
||||
- raftdb stores raft log and `RaftLocalState`
|
||||
- kvdb stores key-value data in different column families, `RegionLocalState` and `RaftApplyState`. You can regard kvdb as the state machine mentioned in Raft paper
|
||||
|
||||
The format is as below and some helper functions are provided in `kv/raftstore/meta`, and set them to badger with `writebatch.SetMeta()`.
|
||||
|
||||
@ -130,8 +131,9 @@ To append log entries, simply save all log entries at `raft.Ready.Entries` to ra
|
||||
To save the hard state is also very easy, just update peer storage’s `RaftLocalState.HardState` and save it to raftdb.
|
||||
|
||||
> Hints:
|
||||
> - Use `WriteBatch` to save these states at once.
|
||||
> - See other functions at `peer_storage.go` for how to read and write these states.
|
||||
>
|
||||
> - Use `WriteBatch` to save these states at once.
|
||||
> - See other functions at `peer_storage.go` for how to read and write these states.
|
||||
|
||||
### Implement Raft ready process
|
||||
|
||||
@ -141,12 +143,12 @@ The Raft `RawNode` is already created with `PeerStorage` and stored in `peer`. I
|
||||
|
||||
`HandleMsgs` processes all the messages received from raftCh, including `MsgTypeTick` which calls `RawNode.Tick()` to drive the Raft, `MsgTypeRaftCmd` which wraps the request from clients and `MsyTypeRaftMessage` which is the message transported between Raft peers. All the message types are defined in `kv/raftstore/message/msg.go`. You can check it for detail and some of them will be used in the following parts.
|
||||
|
||||
After the message is processed, the Raft node should have some state updates. So `HandleRaftReady` should get the ready from Raft module and do corresponding actions like persisting log entries, applying committed entries
|
||||
and sending raft messages to other peers through the network.
|
||||
After the message is processed, the Raft node should have some state updates. So `HandleRaftReady` should get the ready from Raft module and do corresponding actions like persisting log entries, applying committed entries
|
||||
and sending raft messages to other peers through the network.
|
||||
|
||||
In a pseudocode, the raftstore uses Raft like:
|
||||
|
||||
```
|
||||
``` go
|
||||
for {
|
||||
select {
|
||||
case <-s.Ticker:
|
||||
@ -166,39 +168,38 @@ for {
|
||||
|
||||
After this the whole process of a read or write would be like this:
|
||||
|
||||
- Clients calls RPC RawGet/RawPut/RawDelete/RawScan
|
||||
- RPC handler calls `RaftStorage` related method
|
||||
- `RaftStorage` sends a Raft command request to raftstore, and waits for the response
|
||||
- `RaftStore` proposes the Raft command request as a Raft log
|
||||
- Raft module appends the log, and persist by `PeerStorage`
|
||||
- Raft module commits the log
|
||||
- Raft worker executes the Raft command when handing Raft ready, and returns the response by callback
|
||||
- `RaftStorage` receive the response from callback and returns to RPC handler
|
||||
- RPC handler does some actions and returns the RPC response to clients.
|
||||
- Clients calls RPC RawGet/RawPut/RawDelete/RawScan
|
||||
- RPC handler calls `RaftStorage` related method
|
||||
- `RaftStorage` sends a Raft command request to raftstore, and waits for the response
|
||||
- `RaftStore` proposes the Raft command request as a Raft log
|
||||
- Raft module appends the log, and persist by `PeerStorage`
|
||||
- Raft module commits the log
|
||||
- Raft worker executes the Raft command when handing Raft ready, and returns the response by callback
|
||||
- `RaftStorage` receive the response from callback and returns to RPC handler
|
||||
- RPC handler does some actions and returns the RPC response to clients.
|
||||
|
||||
You should run `make project2b` to pass all the tests. The whole test is running a mock cluster including multiple TinyKV instances with a mock network. It performs some read and write operations and checks whether the return values are as expected.
|
||||
You should run `make project2b` to pass all the tests. The whole test is running a mock cluster including multiple TinyKV instances with a mock network. It performs some read and write operations and checks whether the return values are as expected.
|
||||
|
||||
To be noted, error handling is an important part of passing the test. You may have already noticed that there are some errors defined in `proto/proto/errorpb.proto` and the error is a field of the gRPC response. Also, the corresponding errors which implements `error` interface are defined in `kv/raftstore/util/error.go`, so you can use them as a return value of functions.
|
||||
|
||||
These error are mainly related with Region. So it is also a member of `RaftResponseHeader` of `RaftCmdResponse`. When proposaling a request or applying a command, there may be some errors. If that, you should return the raft command response with the error, then the error will be further passed to gRPC response. You can use `BindErrResp` provided in `kv/raftstore/cmd_resp.go` to convert these errors to errors defined in `errorpb.proto` when returning the response with error.
|
||||
These error are mainly related with Region. So it is also a member of `RaftResponseHeader` of `RaftCmdResponse`. When proposaling a request or applying a command, there may be some errors. If that, you should return the raft command response with the error, then the error will be further passed to gRPC response. You can use `BindErrResp` provided in `kv/raftstore/cmd_resp.go` to convert these errors to errors defined in `errorpb.proto` when returning the response with error.
|
||||
|
||||
In this stage, you may consider these errors, and others will be processed in project3:
|
||||
|
||||
- ErrNotLeader: the raft command is proposed on a follower. so use it to let client try other peers.
|
||||
- ErrStaleCommand: It may due to leader changes that some logs are not committed and overrided with new leaders’ logs. But client doesn’t know that and is still waiting for the response. So you should return this to let client knows and retries the command again.
|
||||
- ErrNotLeader: the raft command is proposed on a follower. so use it to let client try other peers.
|
||||
- ErrStaleCommand: It may due to leader changes that some logs are not committed and overrided with new leaders’ logs. But client doesn’t know that and is still waiting for the response. So you should return this to let client knows and retries the command again.
|
||||
|
||||
> Hints:
|
||||
> - `PeerStorage` implements the `Storage` interface of Raft module, you should use the provided method `SaveRaftReady()` to persist the Raft related states.
|
||||
> - Use `WriteBatch` in `engine_util` to make multiple writes atomically, for example, you need to make sure to apply the committed entries and update the applied index in one write batch.
|
||||
> - Use `Transport` to send raft messages to other peers, it’s in the `GlobalContext`,
|
||||
> - The server should not complete a get RPC if it is not part of a majority and do not has up-to-date data. You can just put the get operation into the raft log, or implement the optimization for read-only operations that is described in Section 8 in the Raft paper.
|
||||
> - Do not forget to update and persist the apply state when applying the log entries.
|
||||
> - You can apply the committed Raft log entries in an asynchronous way just like TiKV does. It’s not necessary, though a big challenge to improve performance.
|
||||
> - Record the callback of the command when proposing, and return the callback after applying.
|
||||
> - For the snap command response, should set badger Txn to callback explicitly.
|
||||
> - You can change the log level in `kv/config/config.go` to control whether to print some logs, which will be useful when debugging.
|
||||
|
||||
|
||||
>
|
||||
> - `PeerStorage` implements the `Storage` interface of Raft module, you should use the provided method `SaveRaftReady()` to persist the Raft related states.
|
||||
> - Use `WriteBatch` in `engine_util` to make multiple writes atomically, for example, you need to make sure to apply the committed entries and update the applied index in one write batch.
|
||||
> - Use `Transport` to send raft messages to other peers, it’s in the `GlobalContext`,
|
||||
> - The server should not complete a get RPC if it is not part of a majority and do not has up-to-date data. You can just put the get operation into the raft log, or implement the optimization for read-only operations that is described in Section 8 in the Raft paper.
|
||||
> - Do not forget to update and persist the apply state when applying the log entries.
|
||||
> - You can apply the committed Raft log entries in an asynchronous way just like TiKV does. It’s not necessary, though a big challenge to improve performance.
|
||||
> - Record the callback of the command when proposing, and return the callback after applying.
|
||||
> - For the snap command response, should set badger Txn to callback explicitly.
|
||||
> - You can change the log level in `kv/config/config.go` to control whether to print some logs, which will be useful when debugging.
|
||||
|
||||
## Part C
|
||||
|
||||
|
@ -42,11 +42,10 @@ This exercise requires implementing a single struct called `MvccTxn`. In parts B
|
||||
|
||||
`MvccTxn` is defined in [transaction.go](/kv/transaction/mvcc/transaction.go). There is a stub implementation, and some helper functions for encoding and decoding keys. Tests are in [transaction_test.go](/kv/transaction/mvcc/transaction_test.go). For this exercise, you should implement each of the `MvccTxn` methods so that all tests pass. Each method is documented with its intended behaviour.
|
||||
|
||||
### Hints:
|
||||
|
||||
An `MvccTxn` should know the start timestamp of the request it is representing.
|
||||
|
||||
The most challenging methods to implement are likely to be `GetValue` and the methods for retrieving writes. You will need to use `StorageReader` to iterate over a CF. Bear in mind the ordering of encoded keys, and remember that when deciding when a value is valid depends on the commit timestamp, not the start timestamp, of a transaction.
|
||||
> Hints:
|
||||
>
|
||||
> - An `MvccTxn` should know the start timestamp of the request it is representing.
|
||||
> - The most challenging methods to implement are likely to be `GetValue` and the methods for retrieving writes. You will need to use `StorageReader` to iterate over a CF. Bear in mind the ordering of encoded keys, and remember that when deciding when a value is valid depends on the commit timestamp, not the start timestamp, of a transaction.
|
||||
|
||||
## Part B
|
||||
|
||||
@ -62,11 +61,10 @@ You'll need to implement the `KvGet`, `KvPrewrite`, and `KvCommit` methods defin
|
||||
|
||||
TinyKV can handle multiple requests concurrently, so there is the possibility of local race conditions. For example, TinyKV might receive two requests from different clients at the same time, one of which commits a key and the other rolls back the same key. To avoid race conditions, you can *latch* any key in the database. This latch works much like a per-key mutex. One latch covers all CFs. [latches.go](/kv/transaction/latches/latches.go) defines a `Latches` object which provides API for this.
|
||||
|
||||
### Hints:
|
||||
|
||||
All commands will be a part of a transaction. Transactions are identified by a start timestamp (aka start version).
|
||||
|
||||
Any request might cause a region error, these should be handled in the same way as for the raw requests. Most responses have a way to indicate non-fatal errors for situations like a key being locked. By reporting these to the client, it can retry a transaction after some time.
|
||||
> Hints:
|
||||
>
|
||||
> - All commands will be a part of a transaction. Transactions are identified by a start timestamp (aka start version).
|
||||
> - Any request might cause a region error, these should be handled in the same way as for the raw requests. Most responses have a way to indicate non-fatal errors for situations like a key being locked. By reporting these to the client, it can retry a transaction after some time.
|
||||
|
||||
## Part C
|
||||
|
||||
@ -82,12 +80,9 @@ In this part, you will implement `KvScan`, `KvCheckTxnStatus`, `KvBatchRollback`
|
||||
|
||||
`KvResolveLock` inspects a batch of locked keys and either rolls them all back or commits them all.
|
||||
|
||||
### Hints:
|
||||
|
||||
For scanning, you might find it helpful to implement your own scanner (iterator) abstraction which iterates over logical values, rather than the raw values in underlying storage.
|
||||
|
||||
When scanning, some errors can be recorded for an individual key and should not cause the whole scan to stop. For other commands, any single key causing an error should cause the whole operation to stop.
|
||||
|
||||
Since `KvResolveLock` either commits or rolls back its keys, you should be able to share code with the `KvBatchRollback` and `KvCommit` implementations.
|
||||
|
||||
A timestamp consists of a physical and a logical component. The physical part is roughly a monotonic version of wall-clock time. Usually we use the whole timestamp, for example when comparing timestamps for equality. However, when calculating timeouts, we must only use the physical part of the timestamp. To do this you may find the `PhysicalTime` function in [server.go](/kv/server/server.go) useful.
|
||||
> Hints:
|
||||
>
|
||||
> - For scanning, you might find it helpful to implement your own scanner (iterator) abstraction which iterates over logical values, rather than the raw values in underlying storage.
|
||||
> - When scanning, some errors can be recorded for an individual key and should not cause the whole scan to stop. For other commands, any single key causing an error should cause the whole operation to stop.
|
||||
> - Since `KvResolveLock` either commits or rolls back its keys, you should be able to share code with the `KvBatchRollback` and `KvCommit` implementations.
|
||||
> - A timestamp consists of a physical and a logical component. The physical part is roughly a monotonic version of wall-clock time. Usually we use the whole timestamp, for example when comparing timestamps for equality. However, when calculating timeouts, we must only use the physical part of the timestamp. To do this you may find the `PhysicalTime` function in [server.go](/kv/server/server.go) useful.
|
||||
|
Loading…
Reference in New Issue
Block a user