diff --git a/sources/tech/20150717 How to collect NGINX metrics - Part 2.md b/sources/tech/20150717 How to collect NGINX metrics - Part 2.md new file mode 100644 index 0000000000..8d83b3a0f6 --- /dev/null +++ b/sources/tech/20150717 How to collect NGINX metrics - Part 2.md @@ -0,0 +1,237 @@ +How to collect NGINX metrics - Part 2 +================================================================================ +![](http://www.datadoghq.com/wp-content/uploads/2015/07/NGINX_hero_2.png) + +### How to get the NGINX metrics you need ### + +How you go about capturing metrics depends on which version of NGINX you are using, as well as which metrics you wish to access. (See [the companion article][1] for an in-depth exploration of NGINX metrics.) Free, open-source NGINX and the commercial product NGINX Plus both have status modules that report metrics, and NGINX can also be configured to report certain metrics in its logs: + +注:表格 + +++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
MetricAvailability
NGINX (open-source)NGINX PlusNGINX logs
accepts / acceptedxx
handledxx
droppedxx
activexx
requests / totalxx
4xx codesxx
5xx codesxx
request timex
+ +#### Metrics collection: NGINX (open-source) #### + +Open-source NGINX exposes several basic metrics about server activity on a simple status page, provided that you have the HTTP [stub status module][2] enabled. To check if the module is already enabled, run: + + nginx -V 2>&1 | grep -o with-http_stub_status_module + +The status module is enabled if you see with-http_stub_status_module as output in the terminal. + +If that command returns no output, you will need to enable the status module. You can use the --with-http_stub_status_module configuration parameter when [building NGINX from source][3]: + + ./configure \ + … \ + --with-http_stub_status_module + make + sudo make install + +After verifying the module is enabled or enabling it yourself, you will also need to modify your NGINX configuration to set up a locally accessible URL (e.g., /nginx_status) for the status page: + + server { + location /nginx_status { + stub_status on; + + access_log off; + allow 127.0.0.1; + deny all; + } + } + +Note: The server blocks of the NGINX config are usually found not in the master configuration file (e.g., /etc/nginx/nginx.conf) but in supplemental configuration files that are referenced by the master config. To find the relevant configuration files, first locate the master config by running: + + nginx -t + +Open the master configuration file listed, and look for lines beginning with include near the end of the http block, such as: + + include /etc/nginx/conf.d/*.conf; + +In one of the referenced config files you should find the main server block, which you can modify as above to configure NGINX metrics reporting. After changing any configurations, reload the configs by executing: + + nginx -s reload + +Now you can view the status page to see your metrics: + + Active connections: 24 + server accepts handled requests + 1156958 1156958 4491319 + Reading: 0 Writing: 18 Waiting : 6 + +Note that if you are trying to access the status page from a remote machine, you will need to whitelist the remote machine’s IP address in your status configuration, just as 127.0.0.1 is whitelisted in the configuration snippet above. + +The NGINX status page is an easy way to get a quick snapshot of your metrics, but for continuous monitoring you will need to automatically record that data at regular intervals. Parsers for the NGINX status page already exist for monitoring tools such as [Nagios][4] and [Datadog][5], as well as for the statistics collection daemon [collectD][6]. + +#### Metrics collection: NGINX Plus #### + +The commercial NGINX Plus provides [many more metrics][7] through its ngx_http_status_module than are available in open-source NGINX. Among the additional metrics exposed by NGINX Plus are bytes streamed, as well as information about upstream systems and caches. NGINX Plus also reports counts of all HTTP status code types (1xx, 2xx, 3xx, 4xx, 5xx). A sample NGINX Plus status board is available [here][8]. + +![NGINX Plus status board](https://d33tyra1llx9zy.cloudfront.net/blog/images/2015-06-nginx/status_plus-2.png) + +*Note: the “Active” connections on the NGINX Plus status dashboard are defined slightly differently than the Active state connections in the metrics collected via the open-source NGINX stub status module. In NGINX Plus metrics, Active connections do not include connections in the Waiting state (aka Idle connections).* + +NGINX Plus also reports [metrics in JSON format][9] for easy integration with other monitoring systems. With NGINX Plus, you can see the metrics and health status [for a given upstream grouping of servers][10], or drill down to get a count of just the response codes [from a single server][11] in that upstream: + + {"1xx":0,"2xx":3483032,"3xx":0,"4xx":23,"5xx":0,"total":3483055} + +To enable the NGINX Plus metrics dashboard, you can add a status server block inside the http block of your NGINX configuration. ([See the section above][12] on collecting metrics from open-source NGINX for instructions on locating the relevant config files.) For example, to set up a status dashboard at http://your.ip.address:8080/status.html and a JSON interface at http://your.ip.address:8080/status, you would add the following server block: + + server { + listen 8080; + root /usr/share/nginx/html; + + location /status { + status; + } + + location = /status.html { + } + } + +The status pages should be live once you reload your NGINX configuration: + + nginx -s reload + +The official NGINX Plus docs have [more details][13] on how to configure the expanded status module. + +#### Metrics collection: NGINX logs #### + +NGINX’s [log module][14] writes configurable access logs to a destination of your choosing. You can customize the format of your logs and the data they contain by [adding or subtracting variables][15]. The simplest way to capture detailed logs is to add the following line in the server block of your config file (see [the section][16] on collecting metrics from open-source NGINX for instructions on locating your config files): + + access_log logs/host.access.log combined; + +After changing any NGINX configurations, reload the configs by executing: + + nginx -s reload + +The “combined” log format, included by default, captures [a number of key data points][17], such as the actual HTTP request and the corresponding response code. In the example logs below, NGINX logged a 200 (success) status code for a request for /index.html and a 404 (not found) error for the nonexistent /fail. + + 127.0.0.1 - - [19/Feb/2015:12:10:46 -0500] "GET /index.html HTTP/1.1" 200 612 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.111 Safari 537.36" + + 127.0.0.1 - - [19/Feb/2015:12:11:05 -0500] "GET /fail HTTP/1.1" 404 570 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.111 Safari/537.36" + +You can log request processing time as well by adding a new log format to the http block of your NGINX config file: + + log_format nginx '$remote_addr - $remote_user [$time_local] ' + '"$request" $status $body_bytes_sent $request_time ' + '"$http_referer" "$http_user_agent"'; + +And by adding or modifying the access_log line in the server block of your config file: + + access_log logs/host.access.log nginx; + +After reloading the updated configs (by running nginx -s reload), your access logs will include response times, as seen below. The units are seconds, with millisecond resolution. In this instance, the server received a request for /big.pdf, returning a 206 (success) status code after sending 33973115 bytes. Processing the request took 0.202 seconds (202 milliseconds): + + 127.0.0.1 - - [19/Feb/2015:15:50:36 -0500] "GET /big.pdf HTTP/1.1" 206 33973115 0.202 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.111 Safari/537.36" + +You can use a variety of tools and services to parse and analyze NGINX logs. For instance, [rsyslog][18] can monitor your logs and pass them to any number of log-analytics services; you can use a free, open-source tool such as [logstash][19] to collect and analyze logs; or you can use a unified logging layer such as [Fluentd][20] to collect and parse your NGINX logs. + +### Conclusion ### + +Which NGINX metrics you monitor will depend on the tools available to you, and whether the insight provided by a given metric justifies the overhead of monitoring that metric. For instance, is measuring error rates important enough to your organization to justify investing in NGINX Plus or implementing a system to capture and analyze logs? + +At Datadog, we have built integrations with both NGINX and NGINX Plus so that you can begin collecting and monitoring metrics from all your web servers with a minimum of setup. Learn how to monitor NGINX with Datadog [in this post][21], and get started right away with a [free trial of Datadog][22]. + +---------- + +Source Markdown for this post is available [on GitHub][23]. Questions, corrections, additions, etc.? Please [let us know][24]. + +-------------------------------------------------------------------------------- + +via: https://www.datadoghq.com/blog/how-to-collect-nginx-metrics/ + +作者:K Young +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](https://linux.cn/) 荣誉推出 + +[1]:https://www.datadoghq.com/blog/how-to-monitor-nginx/ +[2]:http://nginx.org/en/docs/http/ngx_http_stub_status_module.html +[3]:http://wiki.nginx.org/InstallOptions +[4]:https://exchange.nagios.org/directory/Plugins/Web-Servers/nginx +[5]:http://docs.datadoghq.com/integrations/nginx/ +[6]:https://collectd.org/wiki/index.php/Plugin:nginx +[7]:http://nginx.org/en/docs/http/ngx_http_status_module.html#data +[8]:http://demo.nginx.com/status.html +[9]:http://demo.nginx.com/status +[10]:http://demo.nginx.com/status/upstreams/demoupstreams +[11]:http://demo.nginx.com/status/upstreams/demoupstreams/0/responses +[12]:https://www.datadoghq.com/blog/how-to-collect-nginx-metrics/#open-source +[13]:http://nginx.org/en/docs/http/ngx_http_status_module.html#example +[14]:http://nginx.org/en/docs/http/ngx_http_log_module.html +[15]:http://nginx.org/en/docs/http/ngx_http_log_module.html#log_format +[16]:https://www.datadoghq.com/blog/how-to-collect-nginx-metrics/#open-source +[17]:http://nginx.org/en/docs/http/ngx_http_log_module.html#log_format +[18]:http://www.rsyslog.com/ +[19]:https://www.elastic.co/products/logstash +[20]:http://www.fluentd.org/ +[21]:https://www.datadoghq.com/blog/how-to-monitor-nginx-with-datadog/ +[22]:https://www.datadoghq.com/blog/how-to-collect-nginx-metrics/#sign-up +[23]:https://github.com/DataDog/the-monitor/blob/master/nginx/how_to_collect_nginx_metrics.md +[24]:https://github.com/DataDog/the-monitor/issues \ No newline at end of file diff --git a/sources/tech/20150717 How to monitor NGINX with Datadog - Part 3.md b/sources/tech/20150717 How to monitor NGINX with Datadog - Part 3.md new file mode 100644 index 0000000000..949fd3d949 --- /dev/null +++ b/sources/tech/20150717 How to monitor NGINX with Datadog - Part 3.md @@ -0,0 +1,150 @@ +How to monitor NGINX with Datadog - Part 3 +================================================================================ +![](http://www.datadoghq.com/wp-content/uploads/2015/07/NGINX_hero_3.png) + +If you’ve already read [our post on monitoring NGINX][1], you know how much information you can gain about your web environment from just a handful of metrics. And you’ve also seen just how easy it is to start collecting metrics from NGINX on ad hoc basis. But to implement comprehensive, ongoing NGINX monitoring, you will need a robust monitoring system to store and visualize your metrics, and to alert you when anomalies happen. In this post, we’ll show you how to set up NGINX monitoring in Datadog so that you can view your metrics on customizable dashboards like this: + +![NGINX dashboard](https://d33tyra1llx9zy.cloudfront.net/blog/images/2015-06-nginx/nginx_board_5.png) + +Datadog allows you to build graphs and alerts around individual hosts, services, processes, metrics—or virtually any combination thereof. For instance, you can monitor all of your NGINX hosts, or all hosts in a certain availability zone, or you can monitor a single key metric being reported by all hosts with a certain tag. This post will show you how to: + +- Monitor NGINX metrics on Datadog dashboards, alongside all your other systems +- Set up automated alerts to notify you when a key metric changes dramatically + +### Configuring NGINX ### + +To collect metrics from NGINX, you first need to ensure that NGINX has an enabled status module and a URL for reporting its status metrics. Step-by-step instructions [for configuring open-source NGINX][2] and [NGINX Plus][3] are available in our companion post on metric collection. + +### Integrating Datadog and NGINX ### + +#### Install the Datadog Agent #### + +The Datadog Agent is [the open-source software][4] that collects and reports metrics from your hosts so that you can view and monitor them in Datadog. Installing the agent usually takes [just a single command][5]. + +As soon as your Agent is up and running, you should see your host reporting metrics [in your Datadog account][6]. + +![Datadog infrastructure list](https://d33tyra1llx9zy.cloudfront.net/blog/images/2015-06-nginx/infra_2.png) + +#### Configure the Agent #### + +Next you’ll need to create a simple NGINX configuration file for the Agent. The location of the Agent’s configuration directory for your OS can be found [here][7]. + +Inside that directory, at conf.d/nginx.yaml.example, you will find [a sample NGINX config file][8] that you can edit to provide the status URL and optional tags for each of your NGINX instances: + + init_config: + + instances: + + - nginx_status_url: http://localhost/nginx_status/ + tags: + - instance:foo + +Once you have supplied the status URLs and any tags, save the config file as conf.d/nginx.yaml. + +#### Restart the Agent #### + +You must restart the Agent to load your new configuration file. The restart command varies somewhat by platform—see the specific commands for your platform [here][9]. + +#### Verify the configuration settings #### + +To check that Datadog and NGINX are properly integrated, run the Datadog info command. The command for each platform is available [here][10]. + +If the configuration is correct, you will see a section like this in the output: + + Checks + ====== + + [...] + + nginx + ----- + - instance #0 [OK] + - Collected 8 metrics & 0 events + +#### Install the integration #### + +Finally, switch on the NGINX integration inside your Datadog account. It’s as simple as clicking the “Install Integration” button under the Configuration tab in the [NGINX integration settings][11]. + +![Install integration](https://d33tyra1llx9zy.cloudfront.net/blog/images/2015-06-nginx/install.png) + +### Metrics! ### + +Once the Agent begins reporting NGINX metrics, you will see [an NGINX dashboard][12] among your list of available dashboards in Datadog. + +The basic NGINX dashboard displays a handful of graphs encapsulating most of the key metrics highlighted [in our introduction to NGINX monitoring][13]. (Some metrics, notably request processing time, require log analysis and are not available in Datadog.) + +You can easily create a comprehensive dashboard for monitoring your entire web stack by adding additional graphs with important metrics from outside NGINX. For example, you might want to monitor host-level metrics on your NGINX hosts, such as system load. To start building a custom dashboard, simply clone the default NGINX dashboard by clicking on the gear near the upper right of the dashboard and selecting “Clone Dash”. + +![Clone dash](https://d33tyra1llx9zy.cloudfront.net/blog/images/2015-06-nginx/clone_2.png) + +You can also monitor your NGINX instances at a higher level using Datadog’s [Host Maps][14]—for instance, color-coding all your NGINX hosts by CPU usage to identify potential hotspots. + +![](https://d33tyra1llx9zy.cloudfront.net/blog/images/2015-06-nginx/nginx-host-map-3.png) + +### Alerting on NGINX metrics ### + +Once Datadog is capturing and visualizing your metrics, you will likely want to set up some monitors to automatically keep tabs on your metrics—and to alert you when there are problems. Below we’ll walk through a representative example: a metric monitor that alerts on sudden drops in NGINX throughput. + +#### Monitor your NGINX throughput #### + +Datadog metric alerts can be threshold-based (alert when the metric exceeds a set value) or change-based (alert when the metric changes by a certain amount). In this case we’ll take the latter approach, alerting when our incoming requests per second drop precipitously. Such drops are often indicative of problems. + +1.**Create a new metric monitor**. Select “New Monitor” from the “Monitors” dropdown in Datadog. Select “Metric” as the monitor type. + +![NGINX metric monitor](https://d33tyra1llx9zy.cloudfront.net/blog/images/2015-06-nginx/monitor2_step_1.png) + +2.**Define your metric monitor**. We want to know when our total NGINX requests per second drop by a certain amount. So we define the metric of interest to be the sum of nginx.net.request_per_s across our infrastructure. + +![NGINX metric](https://d33tyra1llx9zy.cloudfront.net/blog/images/2015-06-nginx/monitor2_step_2.png) + +3.**Set metric alert conditions**. Since we want to alert on a change, rather than on a fixed threshold, we select “Change Alert.” We’ll set the monitor to alert us whenever the request volume drops by 30 percent or more. Here we use a one-minute window of data to represent the metric’s value “now” and alert on the average change across that interval, as compared to the metric’s value 10 minutes prior. + +![NGINX metric change alert](https://d33tyra1llx9zy.cloudfront.net/blog/images/2015-06-nginx/monitor2_step_3.png) + +4.**Customize the notification**. If our NGINX request volume drops, we want to notify our team. In this case we will post a notification in the ops team’s chat room and page the engineer on call. In “Say what’s happening”, we name the monitor and add a short message that will accompany the notification to suggest a first step for investigation. We @mention the Slack channel that we use for ops and use @pagerduty to [route the alert to PagerDuty][15] + +![NGINX metric notification](https://d33tyra1llx9zy.cloudfront.net/blog/images/2015-06-nginx/monitor2_step_4v3.png) + +5.**Save the integration monitor**. Click the “Save” button at the bottom of the page. You’re now monitoring a key NGINX [work metric][16], and your on-call engineer will be paged anytime it drops rapidly. + +### Conclusion ### + +In this post we’ve walked you through integrating NGINX with Datadog to visualize your key metrics and notify your team when your web infrastructure shows signs of trouble. + +If you’ve followed along using your own Datadog account, you should now have greatly improved visibility into what’s happening in your web environment, as well as the ability to create automated monitors tailored to your environment, your usage patterns, and the metrics that are most valuable to your organization. + +If you don’t yet have a Datadog account, you can sign up for [a free trial][17] and start monitoring your infrastructure, your applications, and your services today. + +---------- + +Source Markdown for this post is available [on GitHub][18]. Questions, corrections, additions, etc.? Please [let us know][19]. + +------------------------------------------------------------ + +via: https://www.datadoghq.com/blog/how-to-monitor-nginx-with-datadog/ + +作者:K Young +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](https://linux.cn/) 荣誉推出 + +[1]:https://www.datadoghq.com/blog/how-to-monitor-nginx/ +[2]:https://www.datadoghq.com/blog/how-to-collect-nginx-metrics/#open-source +[3]:https://www.datadoghq.com/blog/how-to-collect-nginx-metrics/#plus +[4]:https://github.com/DataDog/dd-agent +[5]:https://app.datadoghq.com/account/settings#agent +[6]:https://app.datadoghq.com/infrastructure +[7]:http://docs.datadoghq.com/guides/basic_agent_usage/ +[8]:https://github.com/DataDog/dd-agent/blob/master/conf.d/nginx.yaml.example +[9]:http://docs.datadoghq.com/guides/basic_agent_usage/ +[10]:http://docs.datadoghq.com/guides/basic_agent_usage/ +[11]:https://app.datadoghq.com/account/settings#integrations/nginx +[12]:https://app.datadoghq.com/dash/integration/nginx +[13]:https://www.datadoghq.com/blog/how-to-monitor-nginx/ +[14]:https://www.datadoghq.com/blog/introducing-host-maps-know-thy-infrastructure/ +[15]:https://www.datadoghq.com/blog/pagerduty/ +[16]:https://www.datadoghq.com/blog/monitoring-101-collecting-data/#metrics +[17]:https://www.datadoghq.com/blog/how-to-monitor-nginx-with-datadog/#sign-up +[18]:https://github.com/DataDog/the-monitor/blob/master/nginx/how_to_monitor_nginx_with_datadog.md +[19]:https://github.com/DataDog/the-monitor/issues \ No newline at end of file diff --git a/sources/tech/20150717 How to monitor NGINX- Part 1.md b/sources/tech/20150717 How to monitor NGINX- Part 1.md new file mode 100644 index 0000000000..25270eb5cb --- /dev/null +++ b/sources/tech/20150717 How to monitor NGINX- Part 1.md @@ -0,0 +1,408 @@ +How to monitor NGINX - Part 1 +================================================================================ +![](http://www.datadoghq.com/wp-content/uploads/2015/07/NGINX_hero_1.png) + +### What is NGINX? ### + +[NGINX][1] (pronounced “engine X”) is a popular HTTP server and reverse proxy server. As an HTTP server, NGINX serves static content very efficiently and reliably, using relatively little memory. As a [reverse proxy][2], it can be used as a single, controlled point of access for multiple back-end servers or for additional applications such as caching and load balancing. NGINX is available as a free, open-source product or in a more full-featured, commercially distributed version called NGINX Plus. + +NGINX can also be used as a mail proxy and a generic TCP proxy, but this article does not directly address NGINX monitoring for these use cases. + +### Key NGINX metrics ### + +By monitoring NGINX you can catch two categories of issues: resource issues within NGINX itself, and also problems developing elsewhere in your web infrastructure. Some of the metrics most NGINX users will benefit from monitoring include **requests per second**, which provides a high-level view of combined end-user activity; **server error rate**, which indicates how often your servers are failing to process seemingly valid requests; and **request processing time**, which describes how long your servers are taking to process client requests (and which can point to slowdowns or other problems in your environment). + +More generally, there are at least three key categories of metrics to watch: + +- Basic activity metrics +- Error metrics +- Performance metrics + +Below we’ll break down a few of the most important NGINX metrics in each category, as well as metrics for a fairly common use case that deserves special mention: using NGINX Plus for reverse proxying. We will also describe how you can monitor all of these metrics with your graphing or monitoring tools of choice. + +This article references metric terminology [introduced in our Monitoring 101 series][3], which provides a framework for metric collection and alerting. + +#### Basic activity metrics #### + +Whatever your NGINX use case, you will no doubt want to monitor how many client requests your servers are receiving and how those requests are being processed. + +NGINX Plus can report basic activity metrics exactly like open-source NGINX, but it also provides a secondary module that reports metrics slightly differently. We discuss open-source NGINX first, then the additional reporting capabilities provided by NGINX Plus. + +**NGINX** + +The diagram below shows the lifecycle of a client connection and how the open-source version of NGINX collects metrics during a connection. + +![connection, request states](https://d33tyra1llx9zy.cloudfront.net/blog/images/2015-06-nginx/nginx_connection_diagram-2.png) + +Accepts, handled, and requests are ever-increasing counters. Active, waiting, reading, and writing grow and shrink with request volume. + +注:表格 + ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameDescriptionMetric type
acceptsCount of client connections attempted by NGINXResource: Utilization
handledCount of successful client connectionsResource: Utilization
activeCurrently active client connectionsResource: Utilization
dropped (calculated)Count of dropped connections (accepts – handled)Work: Errors*
requestsCount of client requestsWork: Throughput
*Strictly speaking, dropped connections is a metric of resource saturation, but since saturation causes NGINX to stop servicing some work (rather than queuing it up for later), “dropped” is best thought of as a work metric.
+ +The **accepts** counter is incremented when an NGINX worker picks up a request for a connection from the OS, whereas **handled** is incremented when the worker actually gets a connection for the request (by establishing a new connection or reusing an open one). These two counts are usually the same—any divergence indicates that connections are being **dropped**, often because a resource limit, such as NGINX’s [worker_connections][4] limit, has been reached. + +Once NGINX successfully handles a connection, the connection moves to an **active** state, where it remains as client requests are processed: + +Active state + +- **Waiting**: An active connection may also be in a Waiting substate if there is no active request at the moment. New connections can bypass this state and move directly to Reading, most commonly when using “accept filter” or “deferred accept”, in which case NGINX does not receive notice of work until it has enough data to begin working on the response. Connections will also be in the Waiting state after sending a response if the connection is set to keep-alive. +- **Reading**: When a request is received, the connection moves out of the waiting state, and the request itself is counted as Reading. In this state NGINX is reading a client request header. Request headers are lightweight, so this is usually a fast operation. +- **Writing**: After the request is read, it is counted as Writing, and remains in that state until a response is returned to the client. That means that the request is Writing while NGINX is waiting for results from upstream systems (systems “behind” NGINX), and while NGINX is operating on the response. Requests will often spend the majority of their time in the Writing state. + +Often a connection will only support one request at a time. In this case, the number of Active connections == Waiting connections + Reading requests + Writing requests. However, the newer SPDY and HTTP/2 protocols allow multiple concurrent requests/responses to be multiplexed over a connection, so Active may be less than the sum of Waiting, Reading, and Writing. (As of this writing, NGINX does not support HTTP/2, but expects to add support during 2015.) + +**NGINX Plus** + +As mentioned above, all of open-source NGINX’s metrics are available within NGINX Plus, but Plus can also report additional metrics. The section covers the metrics that are only available from NGINX Plus. + +![connection, request states](https://d33tyra1llx9zy.cloudfront.net/blog/images/2015-06-nginx/nginx_plus_connection_diagram-2.png) + +Accepted, dropped, and total are ever-increasing counters. Active, idle, and current track the current number of connections or requests in each of those states, so they grow and shrink with request volume. + +注:表格 + ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameDescriptionMetric type
acceptedCount of client connections attempted by NGINXResource: Utilization
droppedCount of dropped connectionsWork: Errors*
activeCurrently active client connectionsResource: Utilization
idleClient connections with zero current requestsResource: Utilization
totalCount of client requestsWork: Throughput
*Strictly speaking, dropped connections is a metric of resource saturation, but since saturation causes NGINX to stop servicing some work (rather than queuing it up for later), “dropped” is best thought of as a work metric.
+ +The **accepted** counter is incremented when an NGINX Plus worker picks up a request for a connection from the OS. If the worker fails to get a connection for the request (by establishing a new connection or reusing an open one), then the connection is dropped and **dropped** is incremented. Ordinarily connections are dropped because a resource limit, such as NGINX Plus’s [worker_connections][4] limit, has been reached. + +**Active** and **idle** are the same as “active” and “waiting” states in open-source NGINX as described [above][5], with one key exception: in open-source NGINX, “waiting” falls under the “active” umbrella, whereas in NGINX Plus “idle” connections are excluded from the “active” count. **Current** is the same as the combined “reading + writing” states in open-source NGINX. + +**Total** is a cumulative count of client requests. Note that a single client connection can involve multiple requests, so this number may be significantly larger than the cumulative number of connections. In fact, (total / accepted) yields the average number of requests per connection. + +**Metric differences between Open-Source and Plus** + +注:表格 + +++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NGINX (open-source)NGINX Plus
acceptsaccepted
dropped must be calculateddropped is reported directly
reading + writingcurrent
waitingidle
active (includes “waiting” states)active (excludes “idle” states)
requeststotal
+ +**Metric to alert on: Dropped connections** + +The number of connections that have been dropped is equal to the difference between accepts and handled (NGINX) or is exposed directly as a standard metric (NGINX Plus). Under normal circumstances, dropped connections should be zero. If your rate of dropped connections per unit time starts to rise, look for possible resource saturation. + +![Dropped connections](https://d33tyra1llx9zy.cloudfront.net/blog/images/2015-06-nginx/dropped_connections.png) + +**Metric to alert on: Requests per second** + +Sampling your request data (**requests** in open-source, or **total** in Plus) with a fixed time interval provides you with the number of requests you’re receiving per unit of time—often minutes or seconds. Monitoring this metric can alert you to spikes in incoming web traffic, whether legitimate or nefarious, or sudden drops, which are usually indicative of problems. A drastic change in requests per second can alert you to problems brewing somewhere in your environment, even if it cannot tell you exactly where those problems lie. Note that all requests are counted the same, regardless of their URLs. + +![Requests per second](https://d33tyra1llx9zy.cloudfront.net/blog/images/2015-06-nginx/requests_per_sec.png) + +**Collecting activity metrics** + +Open-source NGINX exposes these basic server metrics on a simple status page. Because the status information is displayed in a standardized form, virtually any graphing or monitoring tool can be configured to parse the relevant data for analysis, visualization, or alerting. NGINX Plus provides a JSON feed with much richer data. Read the companion post on [NGINX metrics collection][6] for instructions on enabling metrics collection. + +#### Error metrics #### + +注:表格 + +++++ + + + + + + + + + + + + + + + + + + + + + + +
NameDescriptionMetric typeAvailability
4xx codesCount of client errorsWork: ErrorsNGINX logs, NGINX Plus
5xx codesCount of server errorsWork: ErrorsNGINX logs, NGINX Plus
+ +NGINX error metrics tell you how often your servers are returning errors instead of producing useful work. Client errors are represented by 4xx status codes, server errors with 5xx status codes. + +**Metric to alert on: Server error rate** + +Your server error rate is equal to the number of 5xx errors divided by the total number of [status codes][7] (1xx, 2xx, 3xx, 4xx, 5xx), per unit of time (often one to five minutes). If your error rate starts to climb over time, investigation may be in order. If it spikes suddenly, urgent action may be required, as clients are likely to report errors to the end user. + +![Server error rate](https://d33tyra1llx9zy.cloudfront.net/blog/images/2015-06-nginx/5xx_rate.png) + +A note on client errors: while it is tempting to monitor 4xx, there is limited information you can derive from that metric since it measures client behavior without offering any insight into particular URLs. In other words, a change in 4xx could be noise, e.g. web scanners blindly looking for vulnerabilities. + +**Collecting error metrics** + +Although open-source NGINX does not make error rates immediately available for monitoring, there are at least two ways to capture that information: + +- Use the expanded status module available with commercially supported NGINX Plus +- Configure NGINX’s log module to write response codes in access logs + +Read the companion post on NGINX metrics collection for detailed instructions on both approaches. + +#### Performance metrics #### + +注:表格 + +++++ + + + + + + + + + + + + + + + + +
NameDescriptionMetric typeAvailability
request timeTime to process each request, in secondsWork: PerformanceNGINX logs
+ +**Metric to alert on: Request processing time** + +The request time metric logged by NGINX records the processing time for each request, from the reading of the first client bytes to fulfilling the request. Long response times can point to problems upstream. + +**Collecting processing time metrics** + +NGINX and NGINX Plus users can capture data on processing time by adding the $request_time variable to the access log format. More details on configuring logs for monitoring are available in our companion post on [NGINX metrics collection][8]. + +#### Reverse proxy metrics #### + +注:表格 + +++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameDescriptionMetric typeAvailability
Active connections by upstream serverCurrently active client connectionsResource: UtilizationNGINX Plus
5xx codes by upstream serverServer errorsWork: ErrorsNGINX Plus
Available servers per upstream groupServers passing health checksResource: AvailabilityNGINX Plus
+ +One of the most common ways to use NGINX is as a [reverse proxy][9]. The commercially supported NGINX Plus exposes a large number of metrics about backend (or “upstream”) servers, which are relevant to a reverse proxy setup. This section highlights a few of the key upstream metrics that are available to users of NGINX Plus. + +NGINX Plus segments its upstream metrics first by group, and then by individual server. So if, for example, your reverse proxy is distributing requests to five upstream web servers, you can see at a glance whether any of those individual servers is overburdened, and also whether you have enough healthy servers in the upstream group to ensure good response times. + +**Activity metrics** + +The number of **active connections per upstream server** can help you verify that your reverse proxy is properly distributing work across your server group. If you are using NGINX as a load balancer, significant deviations in the number of connections handled by any one server can indicate that the server is struggling to process requests in a timely manner or that the load-balancing method (e.g., [round-robin or IP hashing][10]) you have configured is not optimal for your traffic patterns + +**Error metrics** + +Recall from the error metric section above that 5xx (server error) codes are a valuable metric to monitor, particularly as a share of total response codes. NGINX Plus allows you to easily extract the number of **5xx codes per upstream server**, as well as the total number of responses, to determine that particular server’s error rate. + +**Availability metrics** + +For another view of the health of your web servers, NGINX also makes it simple to monitor the health of your upstream groups via the total number of **servers currently available within each group**. In a large reverse proxy setup, you may not care very much about the current state of any one server, just as long as your pool of available servers is capable of handling the load. But monitoring the total number of servers that are up within each upstream group can provide a very high-level view of the aggregate health of your web servers. + +**Collecting upstream metrics** + +NGINX Plus upstream metrics are exposed on the internal NGINX Plus monitoring dashboard, and are also available via a JSON interface that can serve up metrics into virtually any external monitoring platform. See examples in our companion post on [collecting NGINX metrics][11]. + +### Conclusion ### + +In this post we’ve touched on some of the most useful metrics you can monitor to keep tabs on your NGINX servers. If you are just getting started with NGINX, monitoring most or all of the metrics in the list below will provide good visibility into the health and activity levels of your web infrastructure: + +- [Dropped connections][12] +- [Requests per second][13] +- [Server error rate][14] +- [Request processing time][15] + +Eventually you will recognize additional, more specialized metrics that are particularly relevant to your own infrastructure and use cases. Of course, what you monitor will depend on the tools you have and the metrics available to you. See the companion post for [step-by-step instructions on metric collection][16], whether you use NGINX or NGINX Plus. + +At Datadog, we have built integrations with both NGINX and NGINX Plus so that you can begin collecting and monitoring metrics from all your web servers with a minimum of setup. Learn how to monitor NGINX with Datadog [in this post][17], and get started right away with [a free trial of Datadog][18]. + +### Acknowledgments ### + +Many thanks to the NGINX team for reviewing this article prior to publication and providing important feedback and clarifications. + +---------- + +Source Markdown for this post is available [on GitHub][19]. Questions, corrections, additions, etc.? Please [let us know][20]. + +-------------------------------------------------------------------------------- + +via: https://www.datadoghq.com/blog/how-to-monitor-nginx/ + +作者:K Young +译者:[译者ID](https://github.com/译者ID) +校对:[校对者ID](https://github.com/校对者ID) + +本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](https://linux.cn/) 荣誉推出 + +[1]:http://nginx.org/en/ +[2]:http://nginx.com/resources/glossary/reverse-proxy-server/ +[3]:https://www.datadoghq.com/blog/monitoring-101-collecting-data/ +[4]:http://nginx.org/en/docs/ngx_core_module.html#worker_connections +[5]:https://www.datadoghq.com/blog/how-to-monitor-nginx/#active-state +[6]:https://www.datadoghq.com/blog/how-to-collect-nginx-metrics/ +[7]:http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html +[8]:https://www.datadoghq.com/blog/how-to-collect-nginx-metrics/ +[9]:https://en.wikipedia.org/wiki/Reverse_proxy +[10]:http://nginx.com/blog/load-balancing-with-nginx-plus/ +[11]:https://www.datadoghq.com/blog/how-to-collect-nginx-metrics/ +[12]:https://www.datadoghq.com/blog/how-to-monitor-nginx/#dropped-connections +[13]:https://www.datadoghq.com/blog/how-to-monitor-nginx/#requests-per-second +[14]:https://www.datadoghq.com/blog/how-to-monitor-nginx/#server-error-rate +[15]:https://www.datadoghq.com/blog/how-to-monitor-nginx/#request-processing-time +[16]:https://www.datadoghq.com/blog/how-to-collect-nginx-metrics/ +[17]:https://www.datadoghq.com/blog/how-to-monitor-nginx-with-datadog/ +[18]:https://www.datadoghq.com/blog/how-to-monitor-nginx/#sign-up +[19]:https://github.com/DataDog/the-monitor/blob/master/nginx/how_to_monitor_nginx.md +[20]:https://github.com/DataDog/the-monitor/issues \ No newline at end of file