diff --git a/README.md b/README.md index 7e3972b..0a30098 100644 --- a/README.md +++ b/README.md @@ -7,28 +7,6 @@ The implementation is based on the concept of Resource Management Scopes, whereby resource usage is constrained by a DAG of scopes, accounting for multiple levels of resource constraints. -## Design Considerations - -- The Resource Manager must account for basic resource usage at all - levels of the stack, from the internals to application components - that use the network facilities of libp2p. -- Basic resources include memory, streams, connections, and file - descriptors. These account for both space and time used by - the stack, as each resource has a direct effect on the system - availability and performance. -- The design must support seamless integration for user applications, - which should reap the benefits of resource management without any - changes. That is, existing applications should be oblivious of the - resource manager and transparently obtain limits which protect it - from resource exhaustion and OOM conditions. -- At the same time, the design must support opt-in resource usage - accounting for applications who want to explicitly utilize the - facilities of the system to inform about and constrain their own - resource usage. -- The design must allow the user to set its own limits, which can be - static (fixed) or dynamic. - - ## Basic Resources ### Memory @@ -51,7 +29,7 @@ computational time) at the system level. They are also a scarce resource, as typically (unless the user explicitly intervenes) they are constrained by the system. Exhaustion of file descriptors may render the application incapable of operating (e.g. because it is -unable to open a file), most importantly for libp2p because most +unable to open a file), this is important for libp2p because most operating systems represent sockets as file descriptors. ### Connections @@ -207,7 +185,7 @@ scope. ### User Transaction Scopes User transaction scopes can be created as a child of any extant -resource scope, and provide the prgrammer with a delimited scope for +resource scope, and provide the programmer with a delimited scope for easy resource accounting. Transactions may form a tree that is rooted to some canonical scope in the scope DAG. @@ -230,6 +208,154 @@ limits for the system and transient scopes, default and specific limits for services, protocols, and peers, and limits for connections and streams. +### Scaling Limits + +When building software that is supposed to run on many different kind of machines, +with various memory and CPU configurations, it is desireable to have limits that +scale with the size of the machine. + +This is done using the `ScalingLimitConfig`. For every scope, this configuration +struct defines the absolutely bare minimum limits, and an (optional) increase of +these limits, which will be applied on nodes that have sufficient memory. + +A `ScalingLimitConfig` can be converted into a `LimitConfig` (which can then be +used to initialize a fixed limiter as shown above) by calling the `Scale` method. +The `Scale` method takes two parameters: the amount of memory and the number of file +descriptors that an application is willing to dedicate to libp2p. + +These amounts will differ between use cases: A blockchain node running on a dedicated +server might have a lot of memory, and dedicate 1/4 of that memory to libp2p. On the +other end of the spectrum, a desktop companion application running as a background +task on a consumer laptop will probably dedicate significantly less than 1/4 of its system +memory to libp2p. + +For convenience, the `ScalingLimitConfig` also provides an `AutoScale` method, +which determines the amount of memory and file descriptors available on the +system, and dedicates up to 1/8 of the memory and 1/2 of the file descriptors to +libp2p. + +For example, one might set: +```go +var scalingLimits = ScalingLimitConfig{ + SystemBaseLimit: BaseLimit{ + ConnsInbound: 64, + ConnsOutbound: 128, + Conns: 128, + StreamsInbound: 512, + StreamsOutbound: 1024, + Streams: 1024, + Memory: 128 << 20, + FD: 256, + }, + SystemLimitIncrease: BaseLimitIncrease{ + ConnsInbound: 32, + ConnsOutbound: 64, + Conns: 64, + StreamsInbound: 256, + StreamsOutbound: 512, + Streams: 512, + Memory: 256 << 20, + FDFraction: 1, + }, +} +``` + +The base limit (`SystemBaseLimit`) here is the minimum configuration that any +node will have, no matter how little memory it possesses. For every GB of memory +passed into the `Scale` method, an increase of (`SystemLimitIncrease`) is added. + +For Example, calling `Scale` with 4 GB of memory will result in a limit of 384 for +`Conns` (128 + 4*64). + +The `FDFraction` defines how many of the file descriptors are allocated to this +scope. In the example above, when called with a file descriptor value of 1000, +this would result in a limit of 1256 file descriptors for the system scope. + +Note that we only showed the configuration for the system scope here, equivalent +configuration options apply to all other scopes as well. + +### Default limits + +By default the resource manager ships with some reasonable scaling limits and +makes a reasonable guess at how much system memory you want to dedicate to the +go-libp2p process. For the default definitions see `DefaultLimits` and +`ScalingLimitConfig.AutoScale()`. + +### Tweaking Defaults + +If the defaults seem mostly okay, but you want to adjust one facet you can do +simply copy the default struct object and update the field you want to change. You can +apply changes to a `BaseLimit`, `BaseLimitIncrease`, and `LimitConfig` with +`.Apply`. + +Example +``` +// An example on how to tweak the default limits +tweakedDefaults := DefaultLimits +tweakedDefaults.ProtocolBaseLimit.Apply(BaseLimit{ + Streams: 1024, + StreamsInbound: 512, + StreamsOutbound: 512, +}) +``` + +### How to tune your limits + +Once you've set your limits and monitoring (see [Monitoring](#monitoring) below) you can now tune your +limits better. The `blocked_resources` metric will tell you what was blocked +and for what scope. If you see a steady stream of these blocked requests it +means your resource limits are too low for your usage. If you see a rare sudden +spike, this is okay and it means the resource manager protected you from some +anamoly. + +### How to disable limits + +Sometimes disabling all limits is useful when you want to see how much +resources you use during normal operation. You can then use this information to +define your initial limits. Disable the limits by using `InfiniteLimits`. + +### Debug "resource limit exceeded" errors + +These errors occur whenever a limit is hit. For example you'll get this error if +you are at your limit for the number of streams you can have, and you try to +open one more. + +If you're seeing a lot of "resource limit exceeded" errors take a look at the +`blocked_resources` metric for some information on what was blocked. Also take +a look at the resources used per stream, and per protocol (the Grafana +Dashboard is ideal for this) and check if you're routinely hitting limits or if +these are rare (but noisy) spikes. + +When debugging in general, in may help to search your logs for errors that match +the string "resource limit exceeded" to see if you're hitting some limits +routinely. + +## Monitoring + +Once you have limits set, you'll want to monitor to see if you're running into +your limits often. This could be a sign that you need to raise your limits +(your process is more intensive than you originally thought) or that you need +fix something in your application (surely you don't need over 1000 streams?). + +There are OpenCensus metrics that can be hooked up to the resource manager. See +`obs/stats_test.go` for an example on how to enable this, and `DefaultViews` in +`stats.go` for recommended views. These metrics can be hooked up to Prometheus +or any other OpenCensus supported platform. + +There is also an included Grafana dashboard to help kickstart your +observability into the resource manager. Find more information about it at +`./obs/grafana-dashboards/README.md`. + +## Allowlisting multiaddrs to mitigate eclipse attacks + +If you have a set of trusted peers and IP addresses, you can use the resource +manager's [Allowlist](./docs/allowlist.md) to protect yourself from eclipse +attacks. The set of peers in the allowlist will have their own limits in case +the normal limits are reached. This means you will always be able to connect to +these trusted peers even if you've already reached your system limits. + +Look at `WithAllowlistedMultiaddrs` and its example in the GoDoc to learn more. + ## Examples Here we consider some concrete examples that can ellucidate the abstract @@ -289,71 +415,6 @@ limiter := NewFixedLimiter(limits) ``` The `limits` allows fine-grained control of resource usage on all scopes. -### Scaling Limits - -When building software that is supposed to run on many different kind of machines, -with various memory and CPU configurations, it is desireable to have limits that -scale with the size of the machine. - -This is done using the `ScalingLimitConfig`. For every scope, this configuration -struct defines the absolutely bare minimum limits, and an (optional) increase of -these limits, which will be applied on nodes that have sufficient memory. - -A `ScalingLimitConfig` can be converted into a `LimitConfig` (which can then be -used to initialize a fixed limiter as shown above) by calling the `Scale` method. -The `Scale` method takes two parameters: the amount of memory and the number of file -descriptors that an application is willing to dedicate to libp2p. - -These amounts will differ between use cases: A blockchain node running on a dedicated -server might have a lot of memory, and dedicate 1/4 of that memory to libp2p. On the -other end of the spectrum, a desktop companion application running as a background -task on a consumer laptop will probably dedicate significantly less than 1/4 of its system -memory to libp2p. - -For convenience, the `ScalingLimitConfig` also provides an `AutoScale` method, -which determines the amount of memory and file descriptors available on the -system, and dedicates up to 1/8 of the memory and 1/2 of the file descriptors to libp2p. - -For example, one might set: -```go -var scalingLimits = ScalingLimitConfig{ - SystemBaseLimit: BaseLimit{ - ConnsInbound: 64, - ConnsOutbound: 128, - Conns: 128, - StreamsInbound: 512, - StreamsOutbound: 1024, - Streams: 1024, - Memory: 128 << 20, - FD: 256, - }, - SystemLimitIncrease: BaseLimitIncrease{ - ConnsInbound: 32, - ConnsOutbound: 64, - Conns: 64, - StreamsInbound: 256, - StreamsOutbound: 512, - Streams: 512, - Memory: 256 << 20, - FDFraction: 1, - }, -} -``` - -The base limit (`SystemBaseLimit`) here is the minimum configuration that any -node will have, no matter how little memory it possesses. For every GB of memory -passed into the `Scale` method, an increase of (`SystemLimitIncrease`) is added. - -For Example, calling `Scale` with 4 GB of memory will result in a limit of 384 for -`Conns` (128 + 4*64). - -The `FDFraction` defines how many of the file descriptors are allocated to this -scope. In the example above, when called with a file descriptor value of 1000, -this would result in a limit of 1256 file descriptors for the system scope. - -Note that we only showed the configuration for the system scope here, equivalent -configuration options apply to all other scopes as well. - ## Implementation Notes - The package only exports a constructor for the resource manager and @@ -366,3 +427,24 @@ configuration options apply to all other scopes as well. pointer to a generic resource scope. - Peer and Protocol scopes, which may be created in response to network events, are periodically garbage collected. + +## Design Considerations + +- The Resource Manager must account for basic resource usage at all + levels of the stack, from the internals to application components + that use the network facilities of libp2p. +- Basic resources include memory, streams, connections, and file + descriptors. These account for both space and time used by + the stack, as each resource has a direct effect on the system + availability and performance. +- The design must support seamless integration for user applications, + which should reap the benefits of resource management without any + changes. That is, existing applications should be oblivious of the + resource manager and transparently obtain limits which protect it + from resource exhaustion and OOM conditions. +- At the same time, the design must support opt-in resource usage + accounting for applications who want to explicitly utilize the + facilities of the system to inform about and constrain their own + resource usage. +- The design must allow the user to set its own limits, which can be + static (fixed) or dynamic. diff --git a/limit.go b/limit.go index b1c7d63..21dffb7 100644 --- a/limit.go +++ b/limit.go @@ -1,3 +1,12 @@ +/* +Package rcmgr is the resource manager for go-libp2p. This allows you to track +resources being used throughout your go-libp2p process. As well as making sure +that the process doesn't use more resources than what you define as your +limits. The resource manager only knows about things it is told about, so it's +the responsibility of the user of this library (either go-libp2p or a go-libp2p +user) to make sure they check with the resource manager before actually +allocating the resource. +*/ package rcmgr import (