From 712edfd4d41ddda51f0519a62a07c09bb38e1730 Mon Sep 17 00:00:00 2001 From: Marco Munizaga Date: Thu, 28 Jul 2022 04:46:47 -0700 Subject: [PATCH 1/5] Add docs in the godoc --- limit.go | 120 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 120 insertions(+) diff --git a/limit.go b/limit.go index b1c7d63..07fba2c 100644 --- a/limit.go +++ b/limit.go @@ -1,3 +1,123 @@ +/* +Package rcmgr is the resource manager for go-libp2p. This allows you to track +resources being used throughout your go-libp2p process. As well as making sure +that the process doesn't use more resources than what you define as your +limits. The resource manager only knows about things it is told about, so it's +the responsibility of the user of this library (either go-libp2p or a go-libp2p +user) to make sure they check with the resource manager before actually +allocating the resource. + +Resource Management basics – Scopes + +The Resource Manager is an object that keeps track of how many resources have +been allocated and what they have been allocated for. A resource is a stream, +connection, or memory reservation. The resources can be allocated for the +system, for a peer, for a protocol, or some combination. + +The things that are allocating resources are called "Scopes". A scope can have +a parent scope that limits its resources. A scope can also have child scopes +and it can limit the resources of the child scopes. Scopes form a directed +acyclic graph (DAG) representing resource limits. For example, if scope A is +the parent of scope B, and scope A has a connection limit of 10, then whatever +limit B sets for connections it can never be greater than 10. + +The common scopes are: + +System scope: This is the root scope and represents all the resources that the +resource manager knows about. It can define the absolute limit of the process. + +Transient scope: This is a scope for resources that have yet to be assigned a +peer as an owner. When we first start a connection we are unsure who we're +connecting to, so these connections are limited by the transient (and system) +scope. + +Peer scope: This is a scope defined for a specific peer id. + +Connection scope: This is a scope for a specific connection. + +Allowlist system scope: This is a separate root scope for allowlisted peers. It +lets you define limits for a set of trusted multiaddrs and peers. See +`WithAllowlistedMultiaddrs` and ./docs/allowlist.md for more information on the +allowlist. + +Allowlist transient scope: Similar to the above and the normal transient scope +but for allowlisted peers. + +Protocol scope: This is a scope that defines limits for a specific protocol id. + +There are a couple other scopes that are combination of the above. For example +there is a ProtocolPeer scope that represents the limits for a specific +protocol id for a specific peer. + +Resource Management basics – Limits + +Limits are what define how much of a resource we are willing to allocate. See +`BaseLimit` for what the limit looks like. These are attached to a scope so +that the scope + limit define the resource constraints of the go-libp2p +process. + +Limit scaling + +If the same go-libp2p application is run on various different machines, it's +helpful to have limits that scale relative to the specs of the machine. This +is where `ScalingLimitConfig` helps. With `ScalingLimitConfig` and it's +`ScalingLimitConfig.Scale` method you can define what the minimum resources +should be and how they scale up with machine size. Consult `limit_test.go` for +usage examples. + +Default limits + +By default the resource manager ships with some reasonable scaling limits and +makes a reasonable guess at how much system memory you want to dedicate to the +go-libp2p process. For the default definitions see `DefaultLimits` and +`ScalingLimitConfig.AutoScale()`. + +Tweaking Defaults + +If the defaults seem mostly okay, but you want to adjust one facet you can do +simply copy the defaults and update the field you want to change. You can +apply changes to a `BaseLimit`, `BaseLimitIncrease`, and `LimitConfig` with +`.Apply`. + +Monitoring + +Once you have limits set, you'll want to monitor to see if you're running into +your limits often. This could be a sign that you need to raise your limits +(your process is more intensive than you originally thought) or that you need +fix something in your application (surely you don't need over 1000 streams?). + +There are OpenCensus metrics that can be hooked up to the resource manager. See +`obs/stats_test.go` for an example on how to enable this, and `DefaultViews` in +`stats.go` for recommended views. These metrics can be hooked up to Prometheus +or any other OpenCensus supported platform. + +There is also an included Grafana dashboard to help kickstart your +observability into the resource manager. Find more information about it at +`./obs/grafana-dashboards/README.md`. + +How to tune your limits + +Once you've set your limits and monitoring you can now tune your limits better. +The `blocked_resources` metric will tell you what was blocked and for what +scope. If you see a steady stream of these blocked requests it means your +resource limits are too low for your usage. If you see a rare sudden spike, +this is okay and it means the resource manager protected you from some anamoly. + +How to disable limits + +Sometimes disabling all limits is useful when you want to see how much +resources you use during normal operation. You can then use this information to +define your initial limits. + +How to debug "resource limit exceeded" errors + +If you're seeing a lot of "resource limit exceeded" errors take a look at the +`blocked_resources` metric for some information on what was blocked. Also take +a look at the resources used per stream, and per protocol (the Grafana +Dashboard is ideal for this) and check if you're routinely hitting limits or if +these are rare (but noisy) spikes. + +*/ package rcmgr import ( From d81430f88abdfe3f90284971fdde5d1c56a7e81d Mon Sep 17 00:00:00 2001 From: Marco Munizaga Date: Tue, 9 Aug 2022 12:18:33 +0200 Subject: [PATCH 2/5] Merge new docs into readme --- README.md | 237 ++++++++++++++++++++++++++++++++++-------------------- limit.go | 111 ------------------------- 2 files changed, 149 insertions(+), 199 deletions(-) diff --git a/README.md b/README.md index 7e3972b..229520a 100644 --- a/README.md +++ b/README.md @@ -7,28 +7,6 @@ The implementation is based on the concept of Resource Management Scopes, whereby resource usage is constrained by a DAG of scopes, accounting for multiple levels of resource constraints. -## Design Considerations - -- The Resource Manager must account for basic resource usage at all - levels of the stack, from the internals to application components - that use the network facilities of libp2p. -- Basic resources include memory, streams, connections, and file - descriptors. These account for both space and time used by - the stack, as each resource has a direct effect on the system - availability and performance. -- The design must support seamless integration for user applications, - which should reap the benefits of resource management without any - changes. That is, existing applications should be oblivious of the - resource manager and transparently obtain limits which protect it - from resource exhaustion and OOM conditions. -- At the same time, the design must support opt-in resource usage - accounting for applications who want to explicitly utilize the - facilities of the system to inform about and constrain their own - resource usage. -- The design must allow the user to set its own limits, which can be - static (fixed) or dynamic. - - ## Basic Resources ### Memory @@ -207,7 +185,7 @@ scope. ### User Transaction Scopes User transaction scopes can be created as a child of any extant -resource scope, and provide the prgrammer with a delimited scope for +resource scope, and provide the programmer with a delimited scope for easy resource accounting. Transactions may form a tree that is rooted to some canonical scope in the scope DAG. @@ -230,6 +208,133 @@ limits for the system and transient scopes, default and specific limits for services, protocols, and peers, and limits for connections and streams. +### Scaling Limits + +When building software that is supposed to run on many different kind of machines, +with various memory and CPU configurations, it is desireable to have limits that +scale with the size of the machine. + +This is done using the `ScalingLimitConfig`. For every scope, this configuration +struct defines the absolutely bare minimum limits, and an (optional) increase of +these limits, which will be applied on nodes that have sufficient memory. + +A `ScalingLimitConfig` can be converted into a `LimitConfig` (which can then be +used to initialize a fixed limiter as shown above) by calling the `Scale` method. +The `Scale` method takes two parameters: the amount of memory and the number of file +descriptors that an application is willing to dedicate to libp2p. + +These amounts will differ between use cases: A blockchain node running on a dedicated +server might have a lot of memory, and dedicate 1/4 of that memory to libp2p. On the +other end of the spectrum, a desktop companion application running as a background +task on a consumer laptop will probably dedicate significantly less than 1/4 of its system +memory to libp2p. + +For convenience, the `ScalingLimitConfig` also provides an `AutoScale` method, +which determines the amount of memory and file descriptors available on the +system, and dedicates up to 1/8 of the memory and 1/2 of the file descriptors to +libp2p. + +For example, one might set: +```go +var scalingLimits = ScalingLimitConfig{ + SystemBaseLimit: BaseLimit{ + ConnsInbound: 64, + ConnsOutbound: 128, + Conns: 128, + StreamsInbound: 512, + StreamsOutbound: 1024, + Streams: 1024, + Memory: 128 << 20, + FD: 256, + }, + SystemLimitIncrease: BaseLimitIncrease{ + ConnsInbound: 32, + ConnsOutbound: 64, + Conns: 64, + StreamsInbound: 256, + StreamsOutbound: 512, + Streams: 512, + Memory: 256 << 20, + FDFraction: 1, + }, +} +``` + +The base limit (`SystemBaseLimit`) here is the minimum configuration that any +node will have, no matter how little memory it possesses. For every GB of memory +passed into the `Scale` method, an increase of (`SystemLimitIncrease`) is added. + +For Example, calling `Scale` with 4 GB of memory will result in a limit of 384 for +`Conns` (128 + 4*64). + +The `FDFraction` defines how many of the file descriptors are allocated to this +scope. In the example above, when called with a file descriptor value of 1000, +this would result in a limit of 1256 file descriptors for the system scope. + +Note that we only showed the configuration for the system scope here, equivalent +configuration options apply to all other scopes as well. + +### Default limits + +By default the resource manager ships with some reasonable scaling limits and +makes a reasonable guess at how much system memory you want to dedicate to the +go-libp2p process. For the default definitions see `DefaultLimits` and +`ScalingLimitConfig.AutoScale()`. + +### Tweaking Defaults + +If the defaults seem mostly okay, but you want to adjust one facet you can do +simply copy the defaults and update the field you want to change. You can +apply changes to a `BaseLimit`, `BaseLimitIncrease`, and `LimitConfig` with +`.Apply`. + +### How to tune your limits + +Once you've set your limits and monitoring (see below) you can now tune your +limits better. The `blocked_resources` metric will tell you what was blocked +and for what scope. If you see a steady stream of these blocked requests it +means your resource limits are too low for your usage. If you see a rare sudden +spike, this is okay and it means the resource manager protected you from some +anamoly. + +### How to disable limits + +Sometimes disabling all limits is useful when you want to see how much +resources you use during normal operation. You can then use this information to +define your initial limits. + +### Debug "resource limit exceeded" errors + +These errors occur whenever we've hit a limit. For example we'll get this error +if we are at our limit for the number of streams we can have, and we try to open +one more. + +If you're seeing a lot of "resource limit exceeded" errors take a look at the +`blocked_resources` metric for some information on what was blocked. Also take +a look at the resources used per stream, and per protocol (the Grafana +Dashboard is ideal for this) and check if you're routinely hitting limits or if +these are rare (but noisy) spikes. + +When debugging in general, in may help to search your logs for errors that match +the string "resource limit exceeded" to see if you're hitting some limits +routinely. + +## Monitoring + +Once you have limits set, you'll want to monitor to see if you're running into +your limits often. This could be a sign that you need to raise your limits +(your process is more intensive than you originally thought) or that you need +fix something in your application (surely you don't need over 1000 streams?). + +There are OpenCensus metrics that can be hooked up to the resource manager. See +`obs/stats_test.go` for an example on how to enable this, and `DefaultViews` in +`stats.go` for recommended views. These metrics can be hooked up to Prometheus +or any other OpenCensus supported platform. + +There is also an included Grafana dashboard to help kickstart your +observability into the resource manager. Find more information about it at +`./obs/grafana-dashboards/README.md`. + ## Examples Here we consider some concrete examples that can ellucidate the abstract @@ -289,71 +394,6 @@ limiter := NewFixedLimiter(limits) ``` The `limits` allows fine-grained control of resource usage on all scopes. -### Scaling Limits - -When building software that is supposed to run on many different kind of machines, -with various memory and CPU configurations, it is desireable to have limits that -scale with the size of the machine. - -This is done using the `ScalingLimitConfig`. For every scope, this configuration -struct defines the absolutely bare minimum limits, and an (optional) increase of -these limits, which will be applied on nodes that have sufficient memory. - -A `ScalingLimitConfig` can be converted into a `LimitConfig` (which can then be -used to initialize a fixed limiter as shown above) by calling the `Scale` method. -The `Scale` method takes two parameters: the amount of memory and the number of file -descriptors that an application is willing to dedicate to libp2p. - -These amounts will differ between use cases: A blockchain node running on a dedicated -server might have a lot of memory, and dedicate 1/4 of that memory to libp2p. On the -other end of the spectrum, a desktop companion application running as a background -task on a consumer laptop will probably dedicate significantly less than 1/4 of its system -memory to libp2p. - -For convenience, the `ScalingLimitConfig` also provides an `AutoScale` method, -which determines the amount of memory and file descriptors available on the -system, and dedicates up to 1/8 of the memory and 1/2 of the file descriptors to libp2p. - -For example, one might set: -```go -var scalingLimits = ScalingLimitConfig{ - SystemBaseLimit: BaseLimit{ - ConnsInbound: 64, - ConnsOutbound: 128, - Conns: 128, - StreamsInbound: 512, - StreamsOutbound: 1024, - Streams: 1024, - Memory: 128 << 20, - FD: 256, - }, - SystemLimitIncrease: BaseLimitIncrease{ - ConnsInbound: 32, - ConnsOutbound: 64, - Conns: 64, - StreamsInbound: 256, - StreamsOutbound: 512, - Streams: 512, - Memory: 256 << 20, - FDFraction: 1, - }, -} -``` - -The base limit (`SystemBaseLimit`) here is the minimum configuration that any -node will have, no matter how little memory it possesses. For every GB of memory -passed into the `Scale` method, an increase of (`SystemLimitIncrease`) is added. - -For Example, calling `Scale` with 4 GB of memory will result in a limit of 384 for -`Conns` (128 + 4*64). - -The `FDFraction` defines how many of the file descriptors are allocated to this -scope. In the example above, when called with a file descriptor value of 1000, -this would result in a limit of 1256 file descriptors for the system scope. - -Note that we only showed the configuration for the system scope here, equivalent -configuration options apply to all other scopes as well. - ## Implementation Notes - The package only exports a constructor for the resource manager and @@ -366,3 +406,24 @@ configuration options apply to all other scopes as well. pointer to a generic resource scope. - Peer and Protocol scopes, which may be created in response to network events, are periodically garbage collected. + +## Design Considerations + +- The Resource Manager must account for basic resource usage at all + levels of the stack, from the internals to application components + that use the network facilities of libp2p. +- Basic resources include memory, streams, connections, and file + descriptors. These account for both space and time used by + the stack, as each resource has a direct effect on the system + availability and performance. +- The design must support seamless integration for user applications, + which should reap the benefits of resource management without any + changes. That is, existing applications should be oblivious of the + resource manager and transparently obtain limits which protect it + from resource exhaustion and OOM conditions. +- At the same time, the design must support opt-in resource usage + accounting for applications who want to explicitly utilize the + facilities of the system to inform about and constrain their own + resource usage. +- The design must allow the user to set its own limits, which can be + static (fixed) or dynamic. diff --git a/limit.go b/limit.go index 07fba2c..21dffb7 100644 --- a/limit.go +++ b/limit.go @@ -6,117 +6,6 @@ limits. The resource manager only knows about things it is told about, so it's the responsibility of the user of this library (either go-libp2p or a go-libp2p user) to make sure they check with the resource manager before actually allocating the resource. - -Resource Management basics – Scopes - -The Resource Manager is an object that keeps track of how many resources have -been allocated and what they have been allocated for. A resource is a stream, -connection, or memory reservation. The resources can be allocated for the -system, for a peer, for a protocol, or some combination. - -The things that are allocating resources are called "Scopes". A scope can have -a parent scope that limits its resources. A scope can also have child scopes -and it can limit the resources of the child scopes. Scopes form a directed -acyclic graph (DAG) representing resource limits. For example, if scope A is -the parent of scope B, and scope A has a connection limit of 10, then whatever -limit B sets for connections it can never be greater than 10. - -The common scopes are: - -System scope: This is the root scope and represents all the resources that the -resource manager knows about. It can define the absolute limit of the process. - -Transient scope: This is a scope for resources that have yet to be assigned a -peer as an owner. When we first start a connection we are unsure who we're -connecting to, so these connections are limited by the transient (and system) -scope. - -Peer scope: This is a scope defined for a specific peer id. - -Connection scope: This is a scope for a specific connection. - -Allowlist system scope: This is a separate root scope for allowlisted peers. It -lets you define limits for a set of trusted multiaddrs and peers. See -`WithAllowlistedMultiaddrs` and ./docs/allowlist.md for more information on the -allowlist. - -Allowlist transient scope: Similar to the above and the normal transient scope -but for allowlisted peers. - -Protocol scope: This is a scope that defines limits for a specific protocol id. - -There are a couple other scopes that are combination of the above. For example -there is a ProtocolPeer scope that represents the limits for a specific -protocol id for a specific peer. - -Resource Management basics – Limits - -Limits are what define how much of a resource we are willing to allocate. See -`BaseLimit` for what the limit looks like. These are attached to a scope so -that the scope + limit define the resource constraints of the go-libp2p -process. - -Limit scaling - -If the same go-libp2p application is run on various different machines, it's -helpful to have limits that scale relative to the specs of the machine. This -is where `ScalingLimitConfig` helps. With `ScalingLimitConfig` and it's -`ScalingLimitConfig.Scale` method you can define what the minimum resources -should be and how they scale up with machine size. Consult `limit_test.go` for -usage examples. - -Default limits - -By default the resource manager ships with some reasonable scaling limits and -makes a reasonable guess at how much system memory you want to dedicate to the -go-libp2p process. For the default definitions see `DefaultLimits` and -`ScalingLimitConfig.AutoScale()`. - -Tweaking Defaults - -If the defaults seem mostly okay, but you want to adjust one facet you can do -simply copy the defaults and update the field you want to change. You can -apply changes to a `BaseLimit`, `BaseLimitIncrease`, and `LimitConfig` with -`.Apply`. - -Monitoring - -Once you have limits set, you'll want to monitor to see if you're running into -your limits often. This could be a sign that you need to raise your limits -(your process is more intensive than you originally thought) or that you need -fix something in your application (surely you don't need over 1000 streams?). - -There are OpenCensus metrics that can be hooked up to the resource manager. See -`obs/stats_test.go` for an example on how to enable this, and `DefaultViews` in -`stats.go` for recommended views. These metrics can be hooked up to Prometheus -or any other OpenCensus supported platform. - -There is also an included Grafana dashboard to help kickstart your -observability into the resource manager. Find more information about it at -`./obs/grafana-dashboards/README.md`. - -How to tune your limits - -Once you've set your limits and monitoring you can now tune your limits better. -The `blocked_resources` metric will tell you what was blocked and for what -scope. If you see a steady stream of these blocked requests it means your -resource limits are too low for your usage. If you see a rare sudden spike, -this is okay and it means the resource manager protected you from some anamoly. - -How to disable limits - -Sometimes disabling all limits is useful when you want to see how much -resources you use during normal operation. You can then use this information to -define your initial limits. - -How to debug "resource limit exceeded" errors - -If you're seeing a lot of "resource limit exceeded" errors take a look at the -`blocked_resources` metric for some information on what was blocked. Also take -a look at the resources used per stream, and per protocol (the Grafana -Dashboard is ideal for this) and check if you're routinely hitting limits or if -these are rare (but noisy) spikes. - */ package rcmgr From c50151cb059abd0f752b3aab8e64476b2a593c80 Mon Sep 17 00:00:00 2001 From: Marco Munizaga Date: Tue, 9 Aug 2022 12:53:28 +0200 Subject: [PATCH 3/5] Nits --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 229520a..e853c8c 100644 --- a/README.md +++ b/README.md @@ -29,7 +29,7 @@ computational time) at the system level. They are also a scarce resource, as typically (unless the user explicitly intervenes) they are constrained by the system. Exhaustion of file descriptors may render the application incapable of operating (e.g. because it is -unable to open a file), most importantly for libp2p because most +unable to open a file), this is important for libp2p because most operating systems represent sockets as file descriptors. ### Connections @@ -301,7 +301,7 @@ anamoly. Sometimes disabling all limits is useful when you want to see how much resources you use during normal operation. You can then use this information to -define your initial limits. +define your initial limits. Disable the limits by using `InfiniteLimits`. ### Debug "resource limit exceeded" errors From 056c0f6622f9a29b871caa101bc0599fd94d4de6 Mon Sep 17 00:00:00 2001 From: Marco Munizaga Date: Tue, 9 Aug 2022 17:56:43 +0200 Subject: [PATCH 4/5] Add section about allowlist --- README.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/README.md b/README.md index e853c8c..4d0fee2 100644 --- a/README.md +++ b/README.md @@ -335,6 +335,16 @@ There is also an included Grafana dashboard to help kickstart your observability into the resource manager. Find more information about it at `./obs/grafana-dashboards/README.md`. +## Allowlisting multiaddrs to mitigate eclipse attacks + +If you have a set of trusted peers and IP addresses, you can use the resource +manager's [Allowlist](./docs/allowlist.md) to protect yourself from eclipse +attacks. The set of peers in the allowlist will have their own limits in case +the normal limits are reached. This means you will always be able to connect to +these trusted peers even if you've already reached your system limits. + +Look at `WithAllowlistedMultiaddrs` and its example in the GoDoc to learn more. + ## Examples Here we consider some concrete examples that can ellucidate the abstract From 703898608a926ba55d054ed1cc1d3f3aa4ac9fc9 Mon Sep 17 00:00:00 2001 From: Marco Munizaga Date: Wed, 10 Aug 2022 17:21:08 -0700 Subject: [PATCH 5/5] Add example --- README.md | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 4d0fee2..0a30098 100644 --- a/README.md +++ b/README.md @@ -284,13 +284,24 @@ go-libp2p process. For the default definitions see `DefaultLimits` and ### Tweaking Defaults If the defaults seem mostly okay, but you want to adjust one facet you can do -simply copy the defaults and update the field you want to change. You can +simply copy the default struct object and update the field you want to change. You can apply changes to a `BaseLimit`, `BaseLimitIncrease`, and `LimitConfig` with `.Apply`. +Example +``` +// An example on how to tweak the default limits +tweakedDefaults := DefaultLimits +tweakedDefaults.ProtocolBaseLimit.Apply(BaseLimit{ + Streams: 1024, + StreamsInbound: 512, + StreamsOutbound: 512, +}) +``` + ### How to tune your limits -Once you've set your limits and monitoring (see below) you can now tune your +Once you've set your limits and monitoring (see [Monitoring](#monitoring) below) you can now tune your limits better. The `blocked_resources` metric will tell you what was blocked and for what scope. If you see a steady stream of these blocked requests it means your resource limits are too low for your usage. If you see a rare sudden @@ -305,9 +316,9 @@ define your initial limits. Disable the limits by using `InfiniteLimits`. ### Debug "resource limit exceeded" errors -These errors occur whenever we've hit a limit. For example we'll get this error -if we are at our limit for the number of streams we can have, and we try to open -one more. +These errors occur whenever a limit is hit. For example you'll get this error if +you are at your limit for the number of streams you can have, and you try to +open one more. If you're seeing a lot of "resource limit exceeded" errors take a look at the `blocked_resources` metric for some information on what was blocked. Also take