mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-01-25 23:11:02 +08:00
Merge pull request #10247 from heguangzhi/translate-20180911
translated
This commit is contained in:
commit
042eb7d4b2
@ -1,112 +0,0 @@
|
||||
heguangzhi Translating
|
||||
|
||||
3 open source log aggregation tools
|
||||
======
|
||||
Log aggregation systems can help with troubleshooting and other tasks. Here are three top options.
|
||||
|
||||
![](https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/data_metrics_analytics_desktop_laptop.png?itok=9QXd7AUr)
|
||||
|
||||
How is metrics aggregation different from log aggregation? Can’t logs include metrics? Can’t log aggregation systems do the same things as metrics aggregation systems?
|
||||
|
||||
These are questions I hear often. I’ve also seen vendors pitching their log aggregation system as the solution to all observability problems. Log aggregation is a valuable tool, but it isn’t normally a good tool for time-series data.
|
||||
|
||||
A couple of valuable features in a time-series metrics aggregation system are the regular interval and the storage system customized specifically for time-series data. The regular interval allows a user to derive real mathematical results consistently. If a log aggregation system is collecting metrics in a regular interval, it can potentially work the same way. However, the storage system isn’t optimized for the types of queries that are typical in a metrics aggregation system. These queries will take more resources and time to process using storage systems found in log aggregation tools.
|
||||
|
||||
So, we know a log aggregation system is likely not suitable for time-series data, but what is it good for? A log aggregation system is a great place for collecting event data. These are irregular activities that are significant. An example might be access logs for a web service. These are significant because we want to know what is accessing our systems and when. Another example would be an application error condition—because it is not a normal operating condition, it might be valuable during troubleshooting.
|
||||
|
||||
A handful of rules for logging:
|
||||
|
||||
* DO include a timestamp
|
||||
* DO format in JSON
|
||||
* DON’T log insignificant events
|
||||
* DO log all application errors
|
||||
* MAYBE log warnings
|
||||
* DO turn on logging
|
||||
* DO write messages in a human-readable form
|
||||
* DON’T log informational data in production
|
||||
* DON’T log anything a human can’t read or react to
|
||||
|
||||
|
||||
|
||||
### Cloud costs
|
||||
|
||||
When investigating log aggregation tools, the cloud might seem like an attractive option. However, it can come with significant costs. Logs represent a lot of data when aggregated across hundreds or thousands of hosts and applications. The ingestion, storage, and retrieval of that data are expensive in cloud-based systems.
|
||||
|
||||
As a point of reference from a real system, a collection of around 500 nodes with a few hundred apps results in 200GB of log data per day. There’s probably room for improvement in that system, but even reducing it by half will cost nearly $10,000 per month in many SaaS offerings. This often includes retention of only 30 days, which isn’t very long if you want to look at trending data year-over-year.
|
||||
|
||||
This isn’t to discourage the use of these systems, as they can be very valuable—especially for smaller organizations. The purpose is to point out that there could be significant costs, and it can be discouraging when they are realized. The rest of this article will focus on open source and commercial solutions that are self-hosted.
|
||||
|
||||
### Tool options
|
||||
|
||||
#### ELK
|
||||
|
||||
[ELK][1], short for Elasticsearch, Logstash, and Kibana, is the most popular open source log aggregation tool on the market. It’s used by Netflix, Facebook, Microsoft, LinkedIn, and Cisco. The three components are all developed and maintained by [Elastic][2]. [Elasticsearch][3] is essentially a NoSQL, Lucene search engine implementation. [Logstash][4] is a log pipeline system that can ingest data, transform it, and load it into a store like Elasticsearch. [Kibana][5] is a visualization layer on top of Elasticsearch.
|
||||
|
||||
A few years ago, Beats were introduced. Beats are data collectors. They simplify the process of shipping data to Logstash. Instead of needing to understand the proper syntax of each type of log, a user can install a Beat that will export NGINX logs or Envoy proxy logs properly so they can be used effectively within Elasticsearch.
|
||||
|
||||
When installing a production-level ELK stack, a few other pieces might be included, like [Kafka][6], [Redis][7], and [NGINX][8]. Also, it is common to replace Logstash with Fluentd, which we’ll discuss later. This system can be complex to operate, which in its early days led to a lot of problems and complaints. These have largely been fixed, but it’s still a complex system, so you might not want to try it if you’re a smaller operation.
|
||||
|
||||
That said, there are services available so you don’t have to worry about that. [Logz.io][9] will run it for you, but its list pricing is a little steep if you have a lot of data. Of course, you’re probably smaller and may not have a lot of data. If you can’t afford Logz.io, you could look at something like [AWS Elasticsearch Service][10] (ES). ES is a service Amazon Web Services (AWS) offers that makes it very easy to get Elasticsearch working quickly. It also has tooling to get all AWS logs into ES using Lambda and S3. This is a much cheaper option, but there is some management required and there are a few limitations.
|
||||
|
||||
Elastic, the parent company of the stack, [offers][11] a more robust product that uses the open core model, which provides additional options around analytics tools, and reporting. It can also be hosted on Google Cloud Platform or AWS. This might be the best option, as this combination of tools and hosting platforms offers a cheaper solution than most SaaS options and still provides a lot of value. This system could effectively replace or give you the capability of a [security information and event management][12] (SIEM) system.
|
||||
|
||||
The ELK stack also offers great visualization tools through Kibana, but it lacks an alerting function. Elastic provides alerting functionality within the paid X-Pack add-on, but there is nothing built in for the open source system. Yelp has created a solution to this problem, called [ElastAlert][13], and there are probably others. This additional piece of software is fairly robust, but it increases the complexity of an already complex system.
|
||||
|
||||
#### Graylog
|
||||
|
||||
[Graylog][14] has recently risen in popularity, but it got its start when Lennart Koopmann created it back in 2010. A company was born with the same name two years later. Despite its increasing use, it still lags far behind the ELK stack. This also means it has fewer community-developed features, but it can use the same Beats that the ELK stack uses. Graylog has gained praise in the Go community with the introduction of the Graylog Collector Sidecar written in [Go][15].
|
||||
|
||||
Graylog uses Elasticsearch, [MongoDB][16], and the Graylog Server under the hood. This makes it as complex to run as the ELK stack and maybe a little more. However, Graylog comes with alerting built into the open source version, as well as several other notable features like streaming, message rewriting, and geolocation.
|
||||
|
||||
The streaming feature allows for data to be routed to specific Streams in real time while they are being processed. With this feature, a user can see all database errors in a single Stream and web server errors in a different Stream. Alerts can even be based on these Streams as new items are added or when a threshold is exceeded. Latency is probably one of the biggest issues with log aggregation systems, and Streams eliminate that issue in Graylog. As soon as the log comes in, it can be routed to other systems through a Stream without being processed fully.
|
||||
|
||||
The message rewriting feature uses the open source rules engine [Drools][17]. This allows all incoming messages to be evaluated against a user-defined rules file enabling a message to be dropped (called Blacklisting), a field to be added or removed, or the message to be modified.
|
||||
|
||||
The coolest feature might be Graylog’s geolocation capability, which supports plotting IP addresses on a map. This is a fairly common feature and is available in Kibana as well, but it adds a lot of value—especially if you want to use this as your SIEM system. The geolocation functionality is provided in the open source version of the system.
|
||||
|
||||
Graylog, the company, charges for support on the open source version if you want it. It also offers an open core model for its Enterprise version that offers archiving, audit logging, and additional support. There aren’t many other options for support or hosting, so you’ll likely be on your own if you don’t use Graylog (the company).
|
||||
|
||||
#### Fluentd
|
||||
|
||||
[Fluentd][18] was developed at [Treasure Data][19], and the [CNCF][20] has adopted it as an Incubating project. It was written in C and Ruby and is recommended by [AWS][21] and [Google Cloud][22]. Fluentd has become a common replacement for Logstash in many installations. It acts as a local aggregator to collect all node logs and send them off to central storage systems. It is not a log aggregation system.
|
||||
|
||||
It uses a robust plugin system to provide quick and easy integrations with different data sources and data outputs. Since there are over 500 plugins available, most of your use cases should be covered. If they aren’t, this sounds like an opportunity to contribute back to the open source community.
|
||||
|
||||
Fluentd is a common choice in Kubernetes environments due to its low memory requirements (just tens of megabytes) and its high throughput. In an environment like [Kubernetes][23], where each pod has a Fluentd sidecar, memory consumption will increase linearly with each new pod created. Using Fluentd will drastically reduce your system utilization. This is becoming a common problem with tools developed in Java that are intended to run one per node where the memory overhead hasn’t been a major issue.
|
||||
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://opensource.com/article/18/9/open-source-log-aggregation-tools
|
||||
|
||||
作者:[Dan Barker][a]
|
||||
选题:[lujun9972](https://github.com/lujun9972)
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://opensource.com/users/barkerd427
|
||||
[1]: https://www.elastic.co/webinars/introduction-elk-stack
|
||||
[2]: https://www.elastic.co/
|
||||
[3]: https://www.elastic.co/products/elasticsearch
|
||||
[4]: https://www.elastic.co/products/logstash
|
||||
[5]: https://www.elastic.co/products/kibana
|
||||
[6]: http://kafka.apache.org/
|
||||
[7]: https://redis.io/
|
||||
[8]: https://www.nginx.com/
|
||||
[9]: https://logz.io/
|
||||
[10]: https://aws.amazon.com/elasticsearch-service/
|
||||
[11]: https://www.elastic.co/cloud
|
||||
[12]: https://en.wikipedia.org/wiki/Security_information_and_event_management
|
||||
[13]: https://github.com/Yelp/elastalert
|
||||
[14]: https://www.graylog.org/
|
||||
[15]: https://opensource.com/tags/go
|
||||
[16]: https://www.mongodb.com/
|
||||
[17]: https://www.drools.org/
|
||||
[18]: https://www.fluentd.org/
|
||||
[19]: https://www.treasuredata.com/
|
||||
[20]: https://www.cncf.io/
|
||||
[21]: https://aws.amazon.com/blogs/aws/all-your-data-fluentd/
|
||||
[22]: https://cloud.google.com/logging/docs/agent/
|
||||
[23]: https://opensource.com/resources/what-is-kubernetes
|
118
translated/tech/20180910 3 open source log aggregation tools.md
Normal file
118
translated/tech/20180910 3 open source log aggregation tools.md
Normal file
@ -0,0 +1,118 @@
|
||||
|
||||
|
||||
|
||||
3个开源日志聚合工具
|
||||
======
|
||||
|
||||
日志聚合系统可以帮助我们故障排除并进行其他的任务。以下是三个主要工具介绍。
|
||||
|
||||
![](https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/data_metrics_analytics_desktop_laptop.png?itok=9QXd7AUr)
|
||||
|
||||
|
||||
|
||||
指标聚合与日志聚合有何不同?日志不能包括指标吗?日志聚合系统不能做与指标聚合系统相同的事情吗?
|
||||
|
||||
这些是我经常听到的问题。我还看到供应商推销他们的日志聚合系统作为所有可观察问题的解决方案。日志聚合是一个有价值的工具,但它通常对时间序列数据的支持不够好。
|
||||
|
||||
时间序列指标聚合系统中几个有价值的功能为规则间隔与专门为时间序列数据定制的存储系统。规则间隔允许用户一次性地导出真实的数据结果。如果要求日志聚合系统定期收集指标数据,它也可以。但是,它的存储系统没有针对指标聚合系统中典型的查询类型进行优化。使用日志聚合工具中的存储系统处理这些查询将花费更多的资源和时间。
|
||||
|
||||
所以,我们知道日志聚合系统可能不适合时间序列数据,但是它有什么好处呢?日志聚合系统是收集事件数据的好地方。这些是非常重要的不规则活动。最好的例子为 web 服务的访问日志。这些都很重要,因为我们想知道什么东西正在访问我们的系统,什么时候访问。另一个例子是应用程序错误记录——因为它不是正常的操作记录,所以在故障排除过程中可能很有价值的。
|
||||
|
||||
日志记录的一些规则:
|
||||
|
||||
* 包含时间戳
|
||||
* 包含 JSON 格式
|
||||
* 不记录无关紧要的事件
|
||||
* 记录所有应用程序的错误
|
||||
* 记录警告错误
|
||||
* 日志记录开关
|
||||
* 以可读的形式记录信息
|
||||
* 不在生产环境中记录信息
|
||||
* 不记录任何无法阅读或无反馈的内容
|
||||
|
||||
|
||||
### 云的成本
|
||||
|
||||
当研究日志聚合工具时,云可能看起来是一个有吸引力的选择。然而,这可能会带来巨大的成本。当跨数百或数千台主机和应用程序聚合时,日志数据是大量的。在基于云的系统中,数据的接收、存储和检索是昂贵的。
|
||||
|
||||
作为一个真实的系统,大约500个节点和几百个应用程序的集合每天产生 200GB 的日志数据。这个系统可能还有改进的空间,但是在许多 SaaS 产品中,即使将它减少一半,每月也要花费将近10,000美元。这通常包括仅保留30天,如果你想查看一年一年的趋势数据,就不可能了。
|
||||
|
||||
并不是要不使用这些系统,尤其是对于较小的组织它们可能非常有价值的。目的是指出可能会有很大的成本,当这些成本达到时,就可能令人非常的沮丧。本文的其余部分将集中讨论自托管的开源和商业解决方案。
|
||||
|
||||
|
||||
### 工具选择
|
||||
|
||||
#### ELK
|
||||
|
||||
[ELK][1] ,简称 Elasticsearch、Logstash 和 Kibana,是最流行的开源日志聚合工具。它被Netflix、Facebook、微软、LinkedIn 和思科使用。这三个组件都是由 [Elastic][2] 开发和维护的。[Elasticsearch][3] 本质上是一个NoSQL,Lucene 搜索引擎实现的。[Logstash][4] 是一个日志管道系统,可以接收数据,转换数据,并将其加载到像 Elasticsearch 这样的应用中。[Kibana][5] 是 Elasticsearch 之上的可视化层。
|
||||
|
||||
几年前,Beats 被引入。Beats 是数据采集器。它们简化了将数据运送到日志存储的过程。用户不需要了解每种日志的正确语法,而是可以安装一个 Beats 来正确导出 NGINX 日志或代理日志,以便在Elasticsearch 中有效地使用它们。
|
||||
|
||||
安装生产环境级 ELK stack 时,可能会包括其他几个部分,如 [Kafka][6], [Redis][7], and [NGINX][8]。此外,用 Fluentd 替换 Logstash 也很常见,我们将在后面讨论。这个系统操作起来很复杂,这在早期导致很多问题和投诉。目前,这些基本上已经被修复,不过它仍然是一个复杂的系统,如果你使用少部分的功能,建议不要使用它了。
|
||||
|
||||
也就是说,服务是可用的,所以你不必担心。[Logz.io][9] 也可以使用,但是如果你有很多数据,它的标价有点高。当然,你可能没有很多数据,来使用用它。如果你买不起 Logz.io,你可以看看 [AWS Elasticsearch Service][10] (ES) 。ES 是 Amazon Web Services (AWS) 提供的一项服务,它使得 Elasticsearch 很容易快速工作。它还拥有使用 Lambda 和 S3 将所有AWS日志记录到 ES 的工具。这是一个更便宜的选择,但是需要一些管理操作,并有一些功能限制。
|
||||
|
||||
|
||||
Elastic [offers][11] 的母公司提供一款更强大的产品,它使用开放核心模型,为分析工具和报告提供了额外的选项。它也可以在谷歌云平台或 AWS 上托管。由于这种工具和托管平台的组合提供了比大多数 SaaS 选项更加便宜,这将是最好的选择并且具有很大的价值。该系统可以有效地取代或提供[security information and event management][12] ( SIEM )系统的功能。
|
||||
|
||||
ELK 栈通过 Kibana 提供了很好的可视化工具,但是它缺少警报功能。Elastic 在付费的 X-Pack 插件中提供了提醒功能,但是在开源系统没有内置任何功能。Yelp 已经开发了一种解决这个问题的方法,[ElastAlert][13], 不过还有其他方式。这个额外的软件相当健壮,但是它增加了已经复杂的系统的复杂性。
|
||||
|
||||
#### Graylog
|
||||
|
||||
[Graylog][14] 最近越来越受欢迎,但它是在2010年 Lennart Koopmann 创建并开发的。两年后,一家公司以同样的名字诞生了。尽管它的使用者越来越多,但仍然远远落后于 ELK 栈。这也意味着它具有较少的社区开发特征,但是它可以使用与 ELK stack 相同的 Beats 。Graylog 由于 Graylog Collector Sidecar 使用 [Go][15] 编写所以在 Go 社区赢得了赞誉。
|
||||
|
||||
Graylog 使用 Elasticsearch、[MongoDB][16] 并且 提供 Graylog Server 。这使得它像ELK 栈一样复杂,也许还要复杂一些。然而,Graylog 附带了内置于开源版本中的报警功能,以及其他一些值得注意的功能,如 streaming 、消息重写 和 地理定位。
|
||||
|
||||
streaming 能允许数据在被处理时被实时路由到特定的 Streams。使用此功能,用户可以在单个Stream 中看到所有数据库错误,在不同的 Stream 中看到 web 服务器错误。当添加新项目或超过阈值时,甚至可以基于这些 Stream 提供警报。延迟可能是日志聚合系统中最大的问题之一,Stream消除了灰色日志中的这一问题。一旦日志进入,它就可以通过 Stream 路由到其他系统,而无需全部处理。
|
||||
|
||||
消息重写功能使用开源规则引擎 [Drools][17] 。允许根据用户定义的规则文件评估所有传入的消息,从而可以删除消息(称为黑名单)、添加或删除字段或修改消息。
|
||||
|
||||
Graylog 最酷的功能是它的地理定位功能,它支持在地图上绘制 IP 地址。这是一个相当常见的功能,在 Kibana 也可以这样使用,但是它增加了很多价值——特别是如果你想将它用作 SIEM 系统。地理定位功能在系统的开源版本中提供。
|
||||
|
||||
Graylog 如果你想要的话,它会对开源版本的支持收费。它还为其企业版提供了一个开放的核心模型,提供存档、审计日志记录和其他支持。如果你不需要 Graylog (the company) 支持或托管的,你可以独立使用。
|
||||
|
||||
#### Fluentd
|
||||
|
||||
[Fluentd][18] 是 [Treasure Data][19] 开发的,[CNCF][20] 已经将它作为一个孵化项目。它是用 C 和 Ruby 编写的,并由[AWS][21] 和 [Google Cloud][22]推荐。fluentd已经成为许多装置中logstach的常用替代品。它充当本地聚合器,收集所有节点日志并将其发送到中央存储系统。它不是日志聚合系统。
|
||||
|
||||
它使用一个强大的插件系统,提供不同数据源和数据输出的快速和简单的集成功能。因为有超过500个插件可用,所以你的大多数用例都应该包括在内。如果没有,这听起来是一个为开源社区做出贡献的机会。
|
||||
|
||||
Fluentd 由于占用内存少(只有几十兆字节)和高吞吐量特性,是 Kubernetes 环境中的常见选择。在像 [Kubernetes][23] 这样的环境中,每个pod 都有一个 Fluentd sidecar ,内存消耗会随着每个新 pod 的创建而线性增加。在这中情况下,使用 Fluentd 将大大降低你的系统利用率。这对于Java开发的工具来说,这是一个常见的问题,这些工具旨在为每个节点运行一个工具,而内存开销并不是主要问题。
|
||||
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://opensource.com/article/18/9/open-source-log-aggregation-tools
|
||||
|
||||
作者:[Dan Barker][a]
|
||||
选题:[lujun9972](https://github.com/lujun9972)
|
||||
译者:[heguangzhi](https://github.com/heguangzhi)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://opensource.com/users/barkerd427
|
||||
[1]: https://www.elastic.co/webinars/introduction-elk-stack
|
||||
[2]: https://www.elastic.co/
|
||||
[3]: https://www.elastic.co/products/elasticsearch
|
||||
[4]: https://www.elastic.co/products/logstash
|
||||
[5]: https://www.elastic.co/products/kibana
|
||||
[6]: http://kafka.apache.org/
|
||||
[7]: https://redis.io/
|
||||
[8]: https://www.nginx.com/
|
||||
[9]: https://logz.io/
|
||||
[10]: https://aws.amazon.com/elasticsearch-service/
|
||||
[11]: https://www.elastic.co/cloud
|
||||
[12]: https://en.wikipedia.org/wiki/Security_information_and_event_management
|
||||
[13]: https://github.com/Yelp/elastalert
|
||||
[14]: https://www.graylog.org/
|
||||
[15]: https://opensource.com/tags/go
|
||||
[16]: https://www.mongodb.com/
|
||||
[17]: https://www.drools.org/
|
||||
[18]: https://www.fluentd.org/
|
||||
[19]: https://www.treasuredata.com/
|
||||
[20]: https://www.cncf.io/
|
||||
[21]: https://aws.amazon.com/blogs/aws/all-your-data-fluentd/
|
||||
[22]: https://cloud.google.com/logging/docs/agent/
|
||||
[23]: https://opensource.com/resources/what-is-kubernetes
|
||||
|
Loading…
Reference in New Issue
Block a user