diff --git a/limit.go b/limit.go index b1c7d63..07fba2c 100644 --- a/limit.go +++ b/limit.go @@ -1,3 +1,123 @@ +/* +Package rcmgr is the resource manager for go-libp2p. This allows you to track +resources being used throughout your go-libp2p process. As well as making sure +that the process doesn't use more resources than what you define as your +limits. The resource manager only knows about things it is told about, so it's +the responsibility of the user of this library (either go-libp2p or a go-libp2p +user) to make sure they check with the resource manager before actually +allocating the resource. + +Resource Management basics – Scopes + +The Resource Manager is an object that keeps track of how many resources have +been allocated and what they have been allocated for. A resource is a stream, +connection, or memory reservation. The resources can be allocated for the +system, for a peer, for a protocol, or some combination. + +The things that are allocating resources are called "Scopes". A scope can have +a parent scope that limits its resources. A scope can also have child scopes +and it can limit the resources of the child scopes. Scopes form a directed +acyclic graph (DAG) representing resource limits. For example, if scope A is +the parent of scope B, and scope A has a connection limit of 10, then whatever +limit B sets for connections it can never be greater than 10. + +The common scopes are: + +System scope: This is the root scope and represents all the resources that the +resource manager knows about. It can define the absolute limit of the process. + +Transient scope: This is a scope for resources that have yet to be assigned a +peer as an owner. When we first start a connection we are unsure who we're +connecting to, so these connections are limited by the transient (and system) +scope. + +Peer scope: This is a scope defined for a specific peer id. + +Connection scope: This is a scope for a specific connection. + +Allowlist system scope: This is a separate root scope for allowlisted peers. It +lets you define limits for a set of trusted multiaddrs and peers. See +`WithAllowlistedMultiaddrs` and ./docs/allowlist.md for more information on the +allowlist. + +Allowlist transient scope: Similar to the above and the normal transient scope +but for allowlisted peers. + +Protocol scope: This is a scope that defines limits for a specific protocol id. + +There are a couple other scopes that are combination of the above. For example +there is a ProtocolPeer scope that represents the limits for a specific +protocol id for a specific peer. + +Resource Management basics – Limits + +Limits are what define how much of a resource we are willing to allocate. See +`BaseLimit` for what the limit looks like. These are attached to a scope so +that the scope + limit define the resource constraints of the go-libp2p +process. + +Limit scaling + +If the same go-libp2p application is run on various different machines, it's +helpful to have limits that scale relative to the specs of the machine. This +is where `ScalingLimitConfig` helps. With `ScalingLimitConfig` and it's +`ScalingLimitConfig.Scale` method you can define what the minimum resources +should be and how they scale up with machine size. Consult `limit_test.go` for +usage examples. + +Default limits + +By default the resource manager ships with some reasonable scaling limits and +makes a reasonable guess at how much system memory you want to dedicate to the +go-libp2p process. For the default definitions see `DefaultLimits` and +`ScalingLimitConfig.AutoScale()`. + +Tweaking Defaults + +If the defaults seem mostly okay, but you want to adjust one facet you can do +simply copy the defaults and update the field you want to change. You can +apply changes to a `BaseLimit`, `BaseLimitIncrease`, and `LimitConfig` with +`.Apply`. + +Monitoring + +Once you have limits set, you'll want to monitor to see if you're running into +your limits often. This could be a sign that you need to raise your limits +(your process is more intensive than you originally thought) or that you need +fix something in your application (surely you don't need over 1000 streams?). + +There are OpenCensus metrics that can be hooked up to the resource manager. See +`obs/stats_test.go` for an example on how to enable this, and `DefaultViews` in +`stats.go` for recommended views. These metrics can be hooked up to Prometheus +or any other OpenCensus supported platform. + +There is also an included Grafana dashboard to help kickstart your +observability into the resource manager. Find more information about it at +`./obs/grafana-dashboards/README.md`. + +How to tune your limits + +Once you've set your limits and monitoring you can now tune your limits better. +The `blocked_resources` metric will tell you what was blocked and for what +scope. If you see a steady stream of these blocked requests it means your +resource limits are too low for your usage. If you see a rare sudden spike, +this is okay and it means the resource manager protected you from some anamoly. + +How to disable limits + +Sometimes disabling all limits is useful when you want to see how much +resources you use during normal operation. You can then use this information to +define your initial limits. + +How to debug "resource limit exceeded" errors + +If you're seeing a lot of "resource limit exceeded" errors take a look at the +`blocked_resources` metric for some information on what was blocked. Also take +a look at the resources used per stream, and per protocol (the Grafana +Dashboard is ideal for this) and check if you're routinely hitting limits or if +these are rare (but noisy) spikes. + +*/ package rcmgr import (