mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-01-25 23:11:02 +08:00
translated
[[翻译完成][tech]: 20230106.0 ⭐️⭐️ Use time-series data to power your edge projects with open source tools.md]
This commit is contained in:
parent
b2619a80f1
commit
66a5ea7ae3
@ -1,123 +0,0 @@
|
||||
[#]: subject: "Use time-series data to power your edge projects with open source tools"
|
||||
[#]: via: "https://opensource.com/article/23/1/time-series-data-edge-open-source-tools"
|
||||
[#]: author: "Zoe Steinkamp https://opensource.com/users/zoesteinkamp"
|
||||
[#]: collector: "lkxed"
|
||||
[#]: translator: "ZhangZhanhaoxiang"
|
||||
[#]: reviewer: " "
|
||||
[#]: publisher: " "
|
||||
[#]: url: " "
|
||||
|
||||
Use time-series data to power your edge projects with open source tools
|
||||
======
|
||||
|
||||
Gathering data as it changes over the passage of time is known as time-series data. Today, it has become a part of every industry and ecosystem. It is a large part of the growing IoT sector and will become a larger part of everyday people's lives. But time-series data and its requirements are hard to work with. This is because there are no tools that are purpose-built to work with time-series data. In this article, I go into detail about those problems and how InfluxData has been working to solve them for the past 10 years.
|
||||
|
||||
### InfluxData
|
||||
|
||||
InfluxData is an open source time-series database platform. You may know about the company through [InfluxDB][1], but you may not have known that it specialized in time-series databases. This is significant, because when managing time-series data, you deal with two issues — storage lifecycle and queries.
|
||||
|
||||
When it comes to storage lifecycle, it's common for developers to initially collect and analyze highly detailed data. But developers want to store smaller, downsampled datasets that describe trends without taking up as much storage space.
|
||||
|
||||
When querying a database, you don't want to query your data based on IDs. You want to query based on time ranges. One of the most common things to do with time-series data is to summarize it over a large period of time. This kind of query is slow when storing data in a typical relational database that uses rows and columns to describe the relationships of different data points. A database designed to process time-series data can handle queries exponentially faster. InfluxDB has its own built-in querying language: Flux. This is specifically built to query on time-series data sets.
|
||||
|
||||
![Image of how Telegraf works.][2]
|
||||
|
||||
### Data acquisition
|
||||
|
||||
Data acquisition and data manipulation come out of the box with some awesome tools. InfluxData has over 12 client libraries that allow you to write and query data in the coding language of your choice. This is a great tool for custom use cases. The open source ingest agent, Telegraf, includes over 300 input and output plugins. If you're a developer, you can contribute your own plugin, as well.
|
||||
|
||||
InfluxDB can also accept a CSV upload for small historical data sets, as well as batch imports for large data sets.
|
||||
|
||||
```
|
||||
import math
|
||||
bicycles3 = from(bucket: "smartcity")
|
||||
|> range(start:2021-03-01T00:00:00z, stop: 2021-04-01T00:00:00z)
|
||||
|> filter(fn: (r) => r._measurement == "city_IoT")
|
||||
|> filter(fn: (r) => r._field == "counter")
|
||||
|> filter(fn: (r) => r.source == "bicycle")
|
||||
|> filter(fn: (r) => r.neighborhood_id == "3")
|
||||
|> aggregateWindow(every: 1h, fn: mean, createEmpty:false)
|
||||
bicycles4 = from(bucket: "smartcity")
|
||||
|> range(start:2021-03-01T00:00:00z, stop: 2021-04-01T00:00:00z)
|
||||
|> filter(fn: (r) => r._measurement == "city_IoT")
|
||||
|> filter(fn: (r) => r._field == "counter")
|
||||
|> filter(fn: (r) => r.source == "bicycle")
|
||||
|> filter(fn: (r) => r.neighborhood_id == "4")
|
||||
|> aggregateWindow(every: 1h, fn: mean, createEmpty:false)join(tables: {neighborhood_3: bicycles3, neighborhood_4: bicycles4}, on ["_time"], method: "inner")
|
||||
|> keep(columns: ["_time", "_value_neighborhood_3","_value_neighborhood_4"])
|
||||
|> map(fn: (r) => ({
|
||||
r with
|
||||
difference_value : math.abs(x: (r._value_neighborhood_3 - r._value_neighborhood_4))
|
||||
}))
|
||||
```
|
||||
|
||||
### Flux
|
||||
|
||||
Flux is our internal querying language built from the ground up to handle time-series data. It's also the underlying powerhouse for a few of our tools, including tasks, alerts, and notifications. To dissect the flux query from above, you need to define a few things. For starters, a "bucket" is what we call a database. You configure your buckets and then add your data stream into them. The query calls the smartcity bucket, with the range of a specific day (a 24-hour period to be exact.) You can get all the data from the bucket, but most users include a data range. That's the most basic flux query you can do.
|
||||
|
||||
Next, I add filters, which filter the data down to something more exact and manageable. For example, I filter for the count of bicycles in the neighborhood assigned to the id of 3. From there, I use aggregateWindow to get the mean for every hour. That means I expect to receive a table with 24 columns, one for every hour in the range. I do this exact same query for neighborhood 4 as well. Finally, I join the two tables and get the differences between bike usage in these two neighborhoods.
|
||||
|
||||
This is great if you want to know what hours are high-traffic hours. Obviously, this is just a small example of the power of flux queries. But it gives a great example of some of the tools flux comes with. I also have a large amount of data analysis and statistics functions. But for that, I suggest checking out the Flux documentation.
|
||||
|
||||
```
|
||||
import "influxdata/influxdb/tasks"
|
||||
option task = {name: PB_downsample, every: 1h, offset: 10s}
|
||||
from(bucket: "plantbuddy")
|
||||
|>range(start: tasks.lastSuccess(orTime: -task.every))
|
||||
|>filter(fn: (r) => r["_measurement"] == "sensor_data")
|
||||
|>aggregateWindow(every: 10m, fn:last, createEmpty:false)
|
||||
|>yield(name: "last")
|
||||
|>to(bucket: "downsampled")
|
||||
```
|
||||
|
||||
### Tasks
|
||||
|
||||
An InfluxDB task is a scheduled Flux script that takes a stream of input data and modifies or analyzes it in some way. It then stores the modified data in a new bucket or performs other actions. Storing a smaller data set into a new bucket is called "downsampling," and it's a core feature of the database, and a core part of the time-series data lifecycle.
|
||||
|
||||
You can see in the current task example that I've downsampled the data. I'm getting the last value for every 10-minute increment and storing that value in the downsampled bucket. The original data set might have had thousands of data points in those 10 minutes, but now the downsampled bucket only has 60 new values. One thing to note is that I'm also using the last success function in range. This tells InfluxDB to run this task from the last time it ran successfully, just in case it has failed for the past 2 hours, in which case it can go back three hours in time to the last successful run. This is great for built-in error handling.
|
||||
|
||||
![Image of the checks and alerts notification system.][3]
|
||||
|
||||
### Checks and alerts
|
||||
|
||||
InfluxDB includes an alerting or checks and notification system. This system is very straightforward. You start with a check that looks at the data periodically for anomalies that you've defined. Normally, this is defined with thresholds. For example, any temperature value under 32° F gets assigned a value of `WARN`, and anything above 32° F gets assigned a value of `OK`, and anything below 0° F gets a value of `CRITICAL`. From there, your check can run as often as you deem necessary. There is a recorded history of your checks and the current status of each. You are not required to set up a notification when it's not needed. You can just reference your alert history as needed.
|
||||
|
||||
Many people choose to set up their notifications. For that, you need to define a notification endpoint. For example, a chat application could make an HTTP call to receive your notifications. Then you define when you would like to receive notifications, for example you can have checks run every hour. You can run notifications every 24 hours. You can have your notification respond to a change in the value, for example `WARN` to `CRITICAL`, or when a value is `CRITICAL`, regardless of it changing from `OK` to `WARN`. This is a highly customizable system. The Flux code that's created from this system can also be edited.
|
||||
|
||||
![Image of the new Edge feature.][4]
|
||||
|
||||
### Edge
|
||||
|
||||
To wrap up, I'd like to bring all the core features together, including a very special new feature that's recently been released. Edge to cloud is a very powerful tool that allows you to run the open source InfluxDB and locally store your data in case of connectivity issues. When connectivity is repaired, it streams the data to the InfluxData cloud platform.
|
||||
|
||||
This is significant for edge devices and important data where any loss of data is detrimental. You define that you want a bucket to be replicated to the cloud, and then that bucket has a disk-backed queue to store the data locally. Then you define what your cloud bucket should replicate into. The data is stored locally until connected to the cloud.
|
||||
|
||||
### InfluxDB and the IoT Edge
|
||||
|
||||
Suppose you have a project where you want to [monitor the health of household plants][5] using IoT sensors attached to the plant. The project is set up using your laptop as the edge device. When your laptop is closed or otherwise off, it stores the data locally, and then streams it to my cloud bucket when reconnected.
|
||||
|
||||
![Image showing how Plant buddy works.][6]
|
||||
|
||||
One thing to notice is that this downsamples data on the local device before storing it in the replication bucket. Your plant's sensors provide a data point for every second. But it condenses the data to be an average of one minute so you have less data to store. In the cloud account, you might add some alerts and notifications that let you know when the plant's moisture is below a certain level and needs to be watered. There could also be visuals you could use on a website to tell users about their plants' health.
|
||||
|
||||
Databases are the backbone of many applications. Working with time-stamped data in a time series database platform like InfluxDB saves developers time, and gives them access to a wide range of tools and services. The maintainers of InfluxDB love seeing what people are building within our open source community, so connect with us and share your projects and code with others!
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://opensource.com/article/23/1/time-series-data-edge-open-source-tools
|
||||
|
||||
作者:[Zoe Steinkamp][a]
|
||||
选题:[lkxed][b]
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://opensource.com/users/zoesteinkamp
|
||||
[b]: https://github.com/lkxed
|
||||
[1]: https://opensource.com/article/17/8/influxdb-time-series-database-stack
|
||||
[2]: https://opensource.com/sites/default/files/2022-12/Telegraf.png
|
||||
[3]: https://opensource.com/sites/default/files/2022-12/TimeSeriesChecks%26Alerts.png
|
||||
[4]: https://opensource.com/sites/default/files/2022-12/TimSeriesEdge.png
|
||||
[5]: https://opensource.com/article/22/5/plant-care
|
||||
[6]: https://opensource.com/sites/default/files/2022-12/TimeSeriesplantbuddy.png
|
@ -0,0 +1,123 @@
|
||||
[#]: subject: "Use time-series data to power your edge projects with open source tools"
|
||||
[#]: via: "https://opensource.com/article/23/1/time-series-data-edge-open-source-tools"
|
||||
[#]: author: "Zoe Steinkamp https://opensource.com/users/zoesteinkamp"
|
||||
[#]: collector: "lkxed"
|
||||
[#]: translator: "ZhangZhanhaoxiang"
|
||||
[#]: reviewer: " "
|
||||
[#]: publisher: " "
|
||||
[#]: url: " "
|
||||
|
||||
使用时间序列数据使用开源工具助力您的边缘项目
|
||||
======
|
||||
|
||||
收集到的随时间变化的数据称为时间序列数据(time-series data)。今天,它已经成为每个行业和生态系统的一部分。它是不断增长的物联网行业的一大组成部分,将成为人们日常生活的重要部分。但时间序列数据及其要求很难处理。这是因为没有专门为处理时间序列数据而构建的工具。在这篇文章中,我将详细介绍这些问题,以及过去 10 年来 InfluxData 如何解决这些问题。
|
||||
|
||||
### InfluxData
|
||||
|
||||
InfluxData 是一个开源的时间序列数据库平台。您可能通过 [InfluxDB][1] 了解该公司,但您可能不知道它专门从事时间序列数据库。这很重要,因为在管理时间序列数据时,您要处理两个问题:存储生命周期(storage lifecycle)和查询(queries)。
|
||||
|
||||
在存储生命周期中,开发人员通常首先收集和分析非常详细的数据。但开发人员希望存储较小的、降采样的数据集,以描述趋势,而不占用太多的存储空间。
|
||||
|
||||
查询数据库时,您不希望基于 ID 查询数据。您希望基于时间范围进行查询。使用时间序列数据最常见的一件事是在一段时间内对其进行总结。在使用行和列来描述不同数据点关系这样的典型关系数据库中存储数据时,这种查询速度较慢。设计用于处理时间序列数据的数据库可以更快地处理查询。InfluxDB 有自己的内置查询语言:Flux。这是专门为查询时间序列数据集而构建的。
|
||||
|
||||
![Telegraf 如何工作的图像][2]
|
||||
|
||||
### 数据采集
|
||||
|
||||
数据采集和数据处理都有一些很棒的工具。InfluxData 有 12 个以上的客户端库,允许您使用自己选择的编程语言编写和查询数据。这是自定义用法的一个很好的工具。开源摄取代理 Telegraf 包括 300 多个输入和输出插件。如果您是一个开发者,您也可以贡献自己的插件。
|
||||
|
||||
InfluxDB 还可以接受上传小体积历史数据集的 CSV 文件,以及大数据集的批量导入。
|
||||
|
||||
```
|
||||
import math
|
||||
bicycles3 = from(bucket: "smartcity")
|
||||
|> range(start:2021-03-01T00:00:00z, stop: 2021-04-01T00:00:00z)
|
||||
|> filter(fn: (r) => r._measurement == "city_IoT")
|
||||
|> filter(fn: (r) => r._field == "counter")
|
||||
|> filter(fn: (r) => r.source == "bicycle")
|
||||
|> filter(fn: (r) => r.neighborhood_id == "3")
|
||||
|> aggregateWindow(every: 1h, fn: mean, createEmpty:false)
|
||||
bicycles4 = from(bucket: "smartcity")
|
||||
|> range(start:2021-03-01T00:00:00z, stop: 2021-04-01T00:00:00z)
|
||||
|> filter(fn: (r) => r._measurement == "city_IoT")
|
||||
|> filter(fn: (r) => r._field == "counter")
|
||||
|> filter(fn: (r) => r.source == "bicycle")
|
||||
|> filter(fn: (r) => r.neighborhood_id == "4")
|
||||
|> aggregateWindow(every: 1h, fn: mean, createEmpty:false)join(tables: {neighborhood_3: bicycles3, neighborhood_4: bicycles4}, on ["_time"], method: "inner")
|
||||
|> keep(columns: ["_time", "_value_neighborhood_3","_value_neighborhood_4"])
|
||||
|> map(fn: (r) => ({
|
||||
r with
|
||||
difference_value : math.abs(x: (r._value_neighborhood_3 - r._value_neighborhood_4))
|
||||
}))
|
||||
```
|
||||
|
||||
### Flux
|
||||
|
||||
Flux 是我们的内部查询语言,从零开始建立来处理时间序列数据。它也是我们一些工具的基础动力,包括任务(tasks)、警报(alerts)和通知(notifications)。要从上面剖析 flux 查询(flux query),需要定义一些东西。首先,“桶(bucket)”就是我们所说的数据库。您可以配置存储桶,然后将数据流添加到其中。查询会调用 smartcity 存储桶,其范围为特定的一天(准确地说是 24 小时)。您可以从存储桶中获取所有数据,但大多数用户都包含一个数据范围。这是你能做的最基本的 flux 查询。
|
||||
|
||||
接下来,我添加过滤器,将数据过滤到更精确、更易于管理的地方。例如,我过滤分配给 id 为 3 的邻居中的自行车数量。从那里,我使用 aggregateWindow 获取每小时的平均值。这意味着我希望收到一个包含 24 列的表,每小时一列。我也对 id 为 4 的邻居进行同样的查询。最后,我将这两张表相叠加,得出这两个社区自行车使用量的差异。
|
||||
|
||||
如果你想知道什么时候是交通高峰,这是不错的选择。显然,这只是 flux 查询功能的一个小例子。但它提供了一个很好的例子,使用了 flux 附带的一些工具。我还有很多的数据分析和统计功能。但对于这一点,我建议查看 Flux 文档。
|
||||
|
||||
```
|
||||
import "influxdata/influxdb/tasks"
|
||||
option task = {name: PB_downsample, every: 1h, offset: 10s}
|
||||
from(bucket: "plantbuddy")
|
||||
|>range(start: tasks.lastSuccess(orTime: -task.every))
|
||||
|>filter(fn: (r) => r["_measurement"] == "sensor_data")
|
||||
|>aggregateWindow(every: 10m, fn:last, createEmpty:false)
|
||||
|>yield(name: "last")
|
||||
|>to(bucket: "downsampled")
|
||||
```
|
||||
|
||||
### 任务(Tasks)
|
||||
|
||||
InfluxDB 任务是一个定时 Flux 脚本,它接收输入数据流并以某种方式修改或分析它。然后,它将修改后的数据存储在新的存储桶中或执行其他操作。将较小的数据集存储到新的存储桶中,称为“降采样”,这是数据库的核心功能,也是时间序列数据生命周期的核心部分。
|
||||
|
||||
您可以在当前任务示例中看到,我已经对数据进行了降采样。我得到每 10 分钟增量的最后一个值,并将该值存储在降采样桶中。原始数据集在这 10 分钟内可能有数千个数据点,但现在降采样桶只有 60 个新值。需要注意的一点是,我还使用了范围内的最后一个成功的函数。这会告诉 InfluxDB 从上次成功运行时开始运行此任务,以防它在过去2小时内失败,在这种情况下,它可以返回到上次成功运行的 3 个小时。这对于内置错误处理非常有用。
|
||||
|
||||
![检查和警报通知系统的图像][3]
|
||||
|
||||
### 检查(Checks)和警报(Alerts)
|
||||
|
||||
InfluxDB 包含一个警报或检查和通知系统。这个系统非常简单直白。首先进行检查,定期查看数据以查找您定义的异常。通常,这是用阈值定义的。例如,任何低于 32°F 的温度值都被指定为“WARN”值,高于 32°F 都被分配为“OK”值,低于 0°F 都被赋予“CRITICAL”值。从那开始,您的检查可以按您认为必要的频率运行。您的检查以及每个支票的当前状态都有历史记录。当不需要通知时,您不会被要求设置它。您可以根据需要引用警报历史记录。
|
||||
|
||||
许多人选择设置通知。为此,您需要定义一个通知端点(notification endpoint)。例如,聊天应用程序可以进行 HTTP 调用以接收通知。然后您定义何时接收通知,例如,您可以每小时运行一次检查。您可以每 24 小时运行一次通知。您可以让通知响应值更改,例如,“WARN”更改为“CRITICAL”,或者当值为“CRITICAL”时,无论如何都从“OK”更改为“WARN”。这是一个高度可定制的系统。从这个系统创建的 Flux 代码也可以编辑。
|
||||
|
||||
![新 Edge 功能的图像][4]
|
||||
|
||||
### 边缘(Edge)
|
||||
|
||||
最后,我想把所有的核心功能放在一起,包括最近发布的一个非常特别的新功能。从边缘到云(Edge to cloud)是一个非常强大的工具,允许您运行开源 InfluxDB,并在出现连接问题时在本地存储数据。连接修复后,它会将数据流传输到 InfluxData 云平台。
|
||||
|
||||
这对于边缘设备和重要数据非常重要,因为任何数据丢失都是有害的。您定义要将一个存储桶复制到云,然后该存储桶有一个磁盘支持的队列来本地存储数据。然后定义云存储桶应该复制到的内容。数据存储在本地,直到连接到云。
|
||||
|
||||
### InfluxDB 和物联网边缘
|
||||
|
||||
假设你有一个项目,你想使用连接到植物上的物联网传感器[监测家里植物的健康状况][5]。该项目是使用您的笔记本电脑作为边缘设备设置的。当你的笔记本电脑合上或关闭时,它会在本地存储数据,然后在重新连接时将数据流传到我的云存储桶。
|
||||
|
||||
![图片展示了 Plant buddy 的工作方式][6]
|
||||
|
||||
需要注意的一点是,在将数据存储到复制桶之前,这会对本地设备上的数据进行降采样。你的植物传感器每秒提供一个数据点。但它将数据压缩为平均一分钟,因此可以存储的数据更少。在云账户中,你可以添加一些警报和通知,让你知道植物的水分何时低于某个水平,需要浇水。也可以在网站上使用视觉效果来告诉用户植物的健康状况。
|
||||
|
||||
数据库是许多应用程序的主干。在像 InfluxDB 的时间序列数据库平台中使用带有时间戳的数据可以节省开发人员的时间,并使他们能够访问各种工具和服务。InfluxDB 的维护者喜欢看到人们在我们的开源社区中构建什么,所以请与我们联系,并与其他人共享您的项目和代码!
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://opensource.com/article/23/1/time-series-data-edge-open-source-tools
|
||||
|
||||
作者:[Zoe Steinkamp][a]
|
||||
选题:[lkxed][b]
|
||||
译者:[ZhangZhanhaoxiang](https://github.com/ZhangZhanhaoxiang)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://opensource.com/users/zoesteinkamp
|
||||
[b]: https://github.com/lkxed
|
||||
[1]: https://opensource.com/article/17/8/influxdb-time-series-database-stack
|
||||
[2]: https://opensource.com/sites/default/files/2022-12/Telegraf.png
|
||||
[3]: https://opensource.com/sites/default/files/2022-12/TimeSeriesChecks%26Alerts.png
|
||||
[4]: https://opensource.com/sites/default/files/2022-12/TimSeriesEdge.png
|
||||
[5]: https://opensource.com/article/22/5/plant-care
|
||||
[6]: https://opensource.com/sites/default/files/2022-12/TimeSeriesplantbuddy.png
|
Loading…
Reference in New Issue
Block a user