mirror of
https://github.com/libp2p/go-libp2p-resource-manager.git
synced 2025-02-10 03:40:43 +08:00
Merge new docs into readme
This commit is contained in:
parent
712edfd4d4
commit
d81430f88a
237
README.md
237
README.md
@ -7,28 +7,6 @@ The implementation is based on the concept of Resource Management
|
|||||||
Scopes, whereby resource usage is constrained by a DAG of scopes,
|
Scopes, whereby resource usage is constrained by a DAG of scopes,
|
||||||
accounting for multiple levels of resource constraints.
|
accounting for multiple levels of resource constraints.
|
||||||
|
|
||||||
## Design Considerations
|
|
||||||
|
|
||||||
- The Resource Manager must account for basic resource usage at all
|
|
||||||
levels of the stack, from the internals to application components
|
|
||||||
that use the network facilities of libp2p.
|
|
||||||
- Basic resources include memory, streams, connections, and file
|
|
||||||
descriptors. These account for both space and time used by
|
|
||||||
the stack, as each resource has a direct effect on the system
|
|
||||||
availability and performance.
|
|
||||||
- The design must support seamless integration for user applications,
|
|
||||||
which should reap the benefits of resource management without any
|
|
||||||
changes. That is, existing applications should be oblivious of the
|
|
||||||
resource manager and transparently obtain limits which protect it
|
|
||||||
from resource exhaustion and OOM conditions.
|
|
||||||
- At the same time, the design must support opt-in resource usage
|
|
||||||
accounting for applications who want to explicitly utilize the
|
|
||||||
facilities of the system to inform about and constrain their own
|
|
||||||
resource usage.
|
|
||||||
- The design must allow the user to set its own limits, which can be
|
|
||||||
static (fixed) or dynamic.
|
|
||||||
|
|
||||||
|
|
||||||
## Basic Resources
|
## Basic Resources
|
||||||
|
|
||||||
### Memory
|
### Memory
|
||||||
@ -207,7 +185,7 @@ scope.
|
|||||||
### User Transaction Scopes
|
### User Transaction Scopes
|
||||||
|
|
||||||
User transaction scopes can be created as a child of any extant
|
User transaction scopes can be created as a child of any extant
|
||||||
resource scope, and provide the prgrammer with a delimited scope for
|
resource scope, and provide the programmer with a delimited scope for
|
||||||
easy resource accounting. Transactions may form a tree that is rooted
|
easy resource accounting. Transactions may form a tree that is rooted
|
||||||
to some canonical scope in the scope DAG.
|
to some canonical scope in the scope DAG.
|
||||||
|
|
||||||
@ -230,6 +208,133 @@ limits for the system and transient scopes, default and specific
|
|||||||
limits for services, protocols, and peers, and limits for connections
|
limits for services, protocols, and peers, and limits for connections
|
||||||
and streams.
|
and streams.
|
||||||
|
|
||||||
|
### Scaling Limits
|
||||||
|
|
||||||
|
When building software that is supposed to run on many different kind of machines,
|
||||||
|
with various memory and CPU configurations, it is desireable to have limits that
|
||||||
|
scale with the size of the machine.
|
||||||
|
|
||||||
|
This is done using the `ScalingLimitConfig`. For every scope, this configuration
|
||||||
|
struct defines the absolutely bare minimum limits, and an (optional) increase of
|
||||||
|
these limits, which will be applied on nodes that have sufficient memory.
|
||||||
|
|
||||||
|
A `ScalingLimitConfig` can be converted into a `LimitConfig` (which can then be
|
||||||
|
used to initialize a fixed limiter as shown above) by calling the `Scale` method.
|
||||||
|
The `Scale` method takes two parameters: the amount of memory and the number of file
|
||||||
|
descriptors that an application is willing to dedicate to libp2p.
|
||||||
|
|
||||||
|
These amounts will differ between use cases: A blockchain node running on a dedicated
|
||||||
|
server might have a lot of memory, and dedicate 1/4 of that memory to libp2p. On the
|
||||||
|
other end of the spectrum, a desktop companion application running as a background
|
||||||
|
task on a consumer laptop will probably dedicate significantly less than 1/4 of its system
|
||||||
|
memory to libp2p.
|
||||||
|
|
||||||
|
For convenience, the `ScalingLimitConfig` also provides an `AutoScale` method,
|
||||||
|
which determines the amount of memory and file descriptors available on the
|
||||||
|
system, and dedicates up to 1/8 of the memory and 1/2 of the file descriptors to
|
||||||
|
libp2p.
|
||||||
|
|
||||||
|
For example, one might set:
|
||||||
|
```go
|
||||||
|
var scalingLimits = ScalingLimitConfig{
|
||||||
|
SystemBaseLimit: BaseLimit{
|
||||||
|
ConnsInbound: 64,
|
||||||
|
ConnsOutbound: 128,
|
||||||
|
Conns: 128,
|
||||||
|
StreamsInbound: 512,
|
||||||
|
StreamsOutbound: 1024,
|
||||||
|
Streams: 1024,
|
||||||
|
Memory: 128 << 20,
|
||||||
|
FD: 256,
|
||||||
|
},
|
||||||
|
SystemLimitIncrease: BaseLimitIncrease{
|
||||||
|
ConnsInbound: 32,
|
||||||
|
ConnsOutbound: 64,
|
||||||
|
Conns: 64,
|
||||||
|
StreamsInbound: 256,
|
||||||
|
StreamsOutbound: 512,
|
||||||
|
Streams: 512,
|
||||||
|
Memory: 256 << 20,
|
||||||
|
FDFraction: 1,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The base limit (`SystemBaseLimit`) here is the minimum configuration that any
|
||||||
|
node will have, no matter how little memory it possesses. For every GB of memory
|
||||||
|
passed into the `Scale` method, an increase of (`SystemLimitIncrease`) is added.
|
||||||
|
|
||||||
|
For Example, calling `Scale` with 4 GB of memory will result in a limit of 384 for
|
||||||
|
`Conns` (128 + 4*64).
|
||||||
|
|
||||||
|
The `FDFraction` defines how many of the file descriptors are allocated to this
|
||||||
|
scope. In the example above, when called with a file descriptor value of 1000,
|
||||||
|
this would result in a limit of 1256 file descriptors for the system scope.
|
||||||
|
|
||||||
|
Note that we only showed the configuration for the system scope here, equivalent
|
||||||
|
configuration options apply to all other scopes as well.
|
||||||
|
|
||||||
|
### Default limits
|
||||||
|
|
||||||
|
By default the resource manager ships with some reasonable scaling limits and
|
||||||
|
makes a reasonable guess at how much system memory you want to dedicate to the
|
||||||
|
go-libp2p process. For the default definitions see `DefaultLimits` and
|
||||||
|
`ScalingLimitConfig.AutoScale()`.
|
||||||
|
|
||||||
|
### Tweaking Defaults
|
||||||
|
|
||||||
|
If the defaults seem mostly okay, but you want to adjust one facet you can do
|
||||||
|
simply copy the defaults and update the field you want to change. You can
|
||||||
|
apply changes to a `BaseLimit`, `BaseLimitIncrease`, and `LimitConfig` with
|
||||||
|
`.Apply`.
|
||||||
|
|
||||||
|
### How to tune your limits
|
||||||
|
|
||||||
|
Once you've set your limits and monitoring (see below) you can now tune your
|
||||||
|
limits better. The `blocked_resources` metric will tell you what was blocked
|
||||||
|
and for what scope. If you see a steady stream of these blocked requests it
|
||||||
|
means your resource limits are too low for your usage. If you see a rare sudden
|
||||||
|
spike, this is okay and it means the resource manager protected you from some
|
||||||
|
anamoly.
|
||||||
|
|
||||||
|
### How to disable limits
|
||||||
|
|
||||||
|
Sometimes disabling all limits is useful when you want to see how much
|
||||||
|
resources you use during normal operation. You can then use this information to
|
||||||
|
define your initial limits.
|
||||||
|
|
||||||
|
### Debug "resource limit exceeded" errors
|
||||||
|
|
||||||
|
These errors occur whenever we've hit a limit. For example we'll get this error
|
||||||
|
if we are at our limit for the number of streams we can have, and we try to open
|
||||||
|
one more.
|
||||||
|
|
||||||
|
If you're seeing a lot of "resource limit exceeded" errors take a look at the
|
||||||
|
`blocked_resources` metric for some information on what was blocked. Also take
|
||||||
|
a look at the resources used per stream, and per protocol (the Grafana
|
||||||
|
Dashboard is ideal for this) and check if you're routinely hitting limits or if
|
||||||
|
these are rare (but noisy) spikes.
|
||||||
|
|
||||||
|
When debugging in general, in may help to search your logs for errors that match
|
||||||
|
the string "resource limit exceeded" to see if you're hitting some limits
|
||||||
|
routinely.
|
||||||
|
|
||||||
|
## Monitoring
|
||||||
|
|
||||||
|
Once you have limits set, you'll want to monitor to see if you're running into
|
||||||
|
your limits often. This could be a sign that you need to raise your limits
|
||||||
|
(your process is more intensive than you originally thought) or that you need
|
||||||
|
fix something in your application (surely you don't need over 1000 streams?).
|
||||||
|
|
||||||
|
There are OpenCensus metrics that can be hooked up to the resource manager. See
|
||||||
|
`obs/stats_test.go` for an example on how to enable this, and `DefaultViews` in
|
||||||
|
`stats.go` for recommended views. These metrics can be hooked up to Prometheus
|
||||||
|
or any other OpenCensus supported platform.
|
||||||
|
|
||||||
|
There is also an included Grafana dashboard to help kickstart your
|
||||||
|
observability into the resource manager. Find more information about it at
|
||||||
|
`./obs/grafana-dashboards/README.md`.
|
||||||
|
|
||||||
## Examples
|
## Examples
|
||||||
|
|
||||||
Here we consider some concrete examples that can ellucidate the abstract
|
Here we consider some concrete examples that can ellucidate the abstract
|
||||||
@ -289,71 +394,6 @@ limiter := NewFixedLimiter(limits)
|
|||||||
```
|
```
|
||||||
The `limits` allows fine-grained control of resource usage on all scopes.
|
The `limits` allows fine-grained control of resource usage on all scopes.
|
||||||
|
|
||||||
### Scaling Limits
|
|
||||||
|
|
||||||
When building software that is supposed to run on many different kind of machines,
|
|
||||||
with various memory and CPU configurations, it is desireable to have limits that
|
|
||||||
scale with the size of the machine.
|
|
||||||
|
|
||||||
This is done using the `ScalingLimitConfig`. For every scope, this configuration
|
|
||||||
struct defines the absolutely bare minimum limits, and an (optional) increase of
|
|
||||||
these limits, which will be applied on nodes that have sufficient memory.
|
|
||||||
|
|
||||||
A `ScalingLimitConfig` can be converted into a `LimitConfig` (which can then be
|
|
||||||
used to initialize a fixed limiter as shown above) by calling the `Scale` method.
|
|
||||||
The `Scale` method takes two parameters: the amount of memory and the number of file
|
|
||||||
descriptors that an application is willing to dedicate to libp2p.
|
|
||||||
|
|
||||||
These amounts will differ between use cases: A blockchain node running on a dedicated
|
|
||||||
server might have a lot of memory, and dedicate 1/4 of that memory to libp2p. On the
|
|
||||||
other end of the spectrum, a desktop companion application running as a background
|
|
||||||
task on a consumer laptop will probably dedicate significantly less than 1/4 of its system
|
|
||||||
memory to libp2p.
|
|
||||||
|
|
||||||
For convenience, the `ScalingLimitConfig` also provides an `AutoScale` method,
|
|
||||||
which determines the amount of memory and file descriptors available on the
|
|
||||||
system, and dedicates up to 1/8 of the memory and 1/2 of the file descriptors to libp2p.
|
|
||||||
|
|
||||||
For example, one might set:
|
|
||||||
```go
|
|
||||||
var scalingLimits = ScalingLimitConfig{
|
|
||||||
SystemBaseLimit: BaseLimit{
|
|
||||||
ConnsInbound: 64,
|
|
||||||
ConnsOutbound: 128,
|
|
||||||
Conns: 128,
|
|
||||||
StreamsInbound: 512,
|
|
||||||
StreamsOutbound: 1024,
|
|
||||||
Streams: 1024,
|
|
||||||
Memory: 128 << 20,
|
|
||||||
FD: 256,
|
|
||||||
},
|
|
||||||
SystemLimitIncrease: BaseLimitIncrease{
|
|
||||||
ConnsInbound: 32,
|
|
||||||
ConnsOutbound: 64,
|
|
||||||
Conns: 64,
|
|
||||||
StreamsInbound: 256,
|
|
||||||
StreamsOutbound: 512,
|
|
||||||
Streams: 512,
|
|
||||||
Memory: 256 << 20,
|
|
||||||
FDFraction: 1,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
The base limit (`SystemBaseLimit`) here is the minimum configuration that any
|
|
||||||
node will have, no matter how little memory it possesses. For every GB of memory
|
|
||||||
passed into the `Scale` method, an increase of (`SystemLimitIncrease`) is added.
|
|
||||||
|
|
||||||
For Example, calling `Scale` with 4 GB of memory will result in a limit of 384 for
|
|
||||||
`Conns` (128 + 4*64).
|
|
||||||
|
|
||||||
The `FDFraction` defines how many of the file descriptors are allocated to this
|
|
||||||
scope. In the example above, when called with a file descriptor value of 1000,
|
|
||||||
this would result in a limit of 1256 file descriptors for the system scope.
|
|
||||||
|
|
||||||
Note that we only showed the configuration for the system scope here, equivalent
|
|
||||||
configuration options apply to all other scopes as well.
|
|
||||||
|
|
||||||
## Implementation Notes
|
## Implementation Notes
|
||||||
|
|
||||||
- The package only exports a constructor for the resource manager and
|
- The package only exports a constructor for the resource manager and
|
||||||
@ -366,3 +406,24 @@ configuration options apply to all other scopes as well.
|
|||||||
pointer to a generic resource scope.
|
pointer to a generic resource scope.
|
||||||
- Peer and Protocol scopes, which may be created in response to
|
- Peer and Protocol scopes, which may be created in response to
|
||||||
network events, are periodically garbage collected.
|
network events, are periodically garbage collected.
|
||||||
|
|
||||||
|
## Design Considerations
|
||||||
|
|
||||||
|
- The Resource Manager must account for basic resource usage at all
|
||||||
|
levels of the stack, from the internals to application components
|
||||||
|
that use the network facilities of libp2p.
|
||||||
|
- Basic resources include memory, streams, connections, and file
|
||||||
|
descriptors. These account for both space and time used by
|
||||||
|
the stack, as each resource has a direct effect on the system
|
||||||
|
availability and performance.
|
||||||
|
- The design must support seamless integration for user applications,
|
||||||
|
which should reap the benefits of resource management without any
|
||||||
|
changes. That is, existing applications should be oblivious of the
|
||||||
|
resource manager and transparently obtain limits which protect it
|
||||||
|
from resource exhaustion and OOM conditions.
|
||||||
|
- At the same time, the design must support opt-in resource usage
|
||||||
|
accounting for applications who want to explicitly utilize the
|
||||||
|
facilities of the system to inform about and constrain their own
|
||||||
|
resource usage.
|
||||||
|
- The design must allow the user to set its own limits, which can be
|
||||||
|
static (fixed) or dynamic.
|
||||||
|
111
limit.go
111
limit.go
@ -6,117 +6,6 @@ limits. The resource manager only knows about things it is told about, so it's
|
|||||||
the responsibility of the user of this library (either go-libp2p or a go-libp2p
|
the responsibility of the user of this library (either go-libp2p or a go-libp2p
|
||||||
user) to make sure they check with the resource manager before actually
|
user) to make sure they check with the resource manager before actually
|
||||||
allocating the resource.
|
allocating the resource.
|
||||||
|
|
||||||
Resource Management basics – Scopes
|
|
||||||
|
|
||||||
The Resource Manager is an object that keeps track of how many resources have
|
|
||||||
been allocated and what they have been allocated for. A resource is a stream,
|
|
||||||
connection, or memory reservation. The resources can be allocated for the
|
|
||||||
system, for a peer, for a protocol, or some combination.
|
|
||||||
|
|
||||||
The things that are allocating resources are called "Scopes". A scope can have
|
|
||||||
a parent scope that limits its resources. A scope can also have child scopes
|
|
||||||
and it can limit the resources of the child scopes. Scopes form a directed
|
|
||||||
acyclic graph (DAG) representing resource limits. For example, if scope A is
|
|
||||||
the parent of scope B, and scope A has a connection limit of 10, then whatever
|
|
||||||
limit B sets for connections it can never be greater than 10.
|
|
||||||
|
|
||||||
The common scopes are:
|
|
||||||
|
|
||||||
System scope: This is the root scope and represents all the resources that the
|
|
||||||
resource manager knows about. It can define the absolute limit of the process.
|
|
||||||
|
|
||||||
Transient scope: This is a scope for resources that have yet to be assigned a
|
|
||||||
peer as an owner. When we first start a connection we are unsure who we're
|
|
||||||
connecting to, so these connections are limited by the transient (and system)
|
|
||||||
scope.
|
|
||||||
|
|
||||||
Peer scope: This is a scope defined for a specific peer id.
|
|
||||||
|
|
||||||
Connection scope: This is a scope for a specific connection.
|
|
||||||
|
|
||||||
Allowlist system scope: This is a separate root scope for allowlisted peers. It
|
|
||||||
lets you define limits for a set of trusted multiaddrs and peers. See
|
|
||||||
`WithAllowlistedMultiaddrs` and ./docs/allowlist.md for more information on the
|
|
||||||
allowlist.
|
|
||||||
|
|
||||||
Allowlist transient scope: Similar to the above and the normal transient scope
|
|
||||||
but for allowlisted peers.
|
|
||||||
|
|
||||||
Protocol scope: This is a scope that defines limits for a specific protocol id.
|
|
||||||
|
|
||||||
There are a couple other scopes that are combination of the above. For example
|
|
||||||
there is a ProtocolPeer scope that represents the limits for a specific
|
|
||||||
protocol id for a specific peer.
|
|
||||||
|
|
||||||
Resource Management basics – Limits
|
|
||||||
|
|
||||||
Limits are what define how much of a resource we are willing to allocate. See
|
|
||||||
`BaseLimit` for what the limit looks like. These are attached to a scope so
|
|
||||||
that the scope + limit define the resource constraints of the go-libp2p
|
|
||||||
process.
|
|
||||||
|
|
||||||
Limit scaling
|
|
||||||
|
|
||||||
If the same go-libp2p application is run on various different machines, it's
|
|
||||||
helpful to have limits that scale relative to the specs of the machine. This
|
|
||||||
is where `ScalingLimitConfig` helps. With `ScalingLimitConfig` and it's
|
|
||||||
`ScalingLimitConfig.Scale` method you can define what the minimum resources
|
|
||||||
should be and how they scale up with machine size. Consult `limit_test.go` for
|
|
||||||
usage examples.
|
|
||||||
|
|
||||||
Default limits
|
|
||||||
|
|
||||||
By default the resource manager ships with some reasonable scaling limits and
|
|
||||||
makes a reasonable guess at how much system memory you want to dedicate to the
|
|
||||||
go-libp2p process. For the default definitions see `DefaultLimits` and
|
|
||||||
`ScalingLimitConfig.AutoScale()`.
|
|
||||||
|
|
||||||
Tweaking Defaults
|
|
||||||
|
|
||||||
If the defaults seem mostly okay, but you want to adjust one facet you can do
|
|
||||||
simply copy the defaults and update the field you want to change. You can
|
|
||||||
apply changes to a `BaseLimit`, `BaseLimitIncrease`, and `LimitConfig` with
|
|
||||||
`.Apply`.
|
|
||||||
|
|
||||||
Monitoring
|
|
||||||
|
|
||||||
Once you have limits set, you'll want to monitor to see if you're running into
|
|
||||||
your limits often. This could be a sign that you need to raise your limits
|
|
||||||
(your process is more intensive than you originally thought) or that you need
|
|
||||||
fix something in your application (surely you don't need over 1000 streams?).
|
|
||||||
|
|
||||||
There are OpenCensus metrics that can be hooked up to the resource manager. See
|
|
||||||
`obs/stats_test.go` for an example on how to enable this, and `DefaultViews` in
|
|
||||||
`stats.go` for recommended views. These metrics can be hooked up to Prometheus
|
|
||||||
or any other OpenCensus supported platform.
|
|
||||||
|
|
||||||
There is also an included Grafana dashboard to help kickstart your
|
|
||||||
observability into the resource manager. Find more information about it at
|
|
||||||
`./obs/grafana-dashboards/README.md`.
|
|
||||||
|
|
||||||
How to tune your limits
|
|
||||||
|
|
||||||
Once you've set your limits and monitoring you can now tune your limits better.
|
|
||||||
The `blocked_resources` metric will tell you what was blocked and for what
|
|
||||||
scope. If you see a steady stream of these blocked requests it means your
|
|
||||||
resource limits are too low for your usage. If you see a rare sudden spike,
|
|
||||||
this is okay and it means the resource manager protected you from some anamoly.
|
|
||||||
|
|
||||||
How to disable limits
|
|
||||||
|
|
||||||
Sometimes disabling all limits is useful when you want to see how much
|
|
||||||
resources you use during normal operation. You can then use this information to
|
|
||||||
define your initial limits.
|
|
||||||
|
|
||||||
How to debug "resource limit exceeded" errors
|
|
||||||
|
|
||||||
If you're seeing a lot of "resource limit exceeded" errors take a look at the
|
|
||||||
`blocked_resources` metric for some information on what was blocked. Also take
|
|
||||||
a look at the resources used per stream, and per protocol (the Grafana
|
|
||||||
Dashboard is ideal for this) and check if you're routinely hitting limits or if
|
|
||||||
these are rare (but noisy) spikes.
|
|
||||||
|
|
||||||
*/
|
*/
|
||||||
package rcmgr
|
package rcmgr
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user