TSD:20190413 The Fargate Illusion.md

This commit is contained in:
Xingyu.Wang 2019-04-16 00:52:02 +08:00
parent c0d64fc697
commit d8f628c98b
2 changed files with 447 additions and 448 deletions

View File

@ -1,448 +0,0 @@
[#]: collector: (lujun9972)
[#]: translator: (wxy)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (The Fargate Illusion)
[#]: via: (https://leebriggs.co.uk/blog/2019/04/13/the-fargate-illusion.html)
[#]: author: (Lee Briggs https://leebriggs.co.uk/)
The Fargate Illusion
======
Ive been building a Kubernetes based platform at $work now for almost a year, and Ive become a bit of a Kubernetes apologist. Its true, I think the technology is fantastic. I am however under no illusions about how difficult it is to operate and maintain. I read posts like [this][1] one earlier in the year and found myself nodding along to certain aspects of the opinion. If I was in a smaller company, with 10/15 engineers, Id be horrified if someone suggested managing and maintaining a fleet of Kubernetes clusters. The operational overhead is just too high.
Despite my love for all things Kubernetes at this point, I do remain curious about the notion that “serverless” computing will kill the ops engineer. The main source of intrigue here is the desire to stay gainfully employed in the future - if we arent going to need OPS engineers in our glorious future, Id like to see what all the fuss is about. Ive done some experimentation in Lamdba and Google Cloud Functions and been impressed by what I saw, but I still firmly believe that serverless solutions only solve a percentage of the problem.
Ive had my eye on [AWS Fargate][2] for some time now and its something that developers at $work have been gleefully pointed at as “serverless computing” - mainly because with Fargate, you can run your Docker container without having to manage the underlying nodes. I wanted to see what that actually meant - so I set about trying to get an app running on Fargate from scratch. I defined the succes criteria here as something close-ish to a “production ready” application, so I wanted to have the following:
* A running container on Fargate
* With configuration pushed down in the form of environment variables
* “Secrets” should not be in plaintext
* Behind a loadbalancer
* TLS enabled with a valid SSL certificate
I approached this whole task from an infrastructure as code mentality, and instead of following the default AWS console wizards, I used terraform to define the infrastructure. Its very possible this overcomplicated things, but I wanted to make sure any deployment was repeatable and discoverable to anyone else wanting to follow along.
All of the above criteria is generally achieveable with a Kubernetes based platform using a few external add-ons and plugins, so Im admittedly approaching this whole task with a comparitive mentality - because Im comparing it with my common workflow. My main goal was to see how easy this was with Fargate, especially when compared with Kubernetes. I was pretty surprised with the outcome.
### AWS has overhead
I had a clean AWS account and was determined to go from zero to a deployed webapp. Like any other infrastructure in AWS, I had to get the baseline infrastructure working - so I first had to define a VPC.
I wanted to follow the best practices, so I carved the VPC up into subnets across availability zones, with a public and a private subnet. It occurred to me at this point that as long as this need was always there, Id probably be able to find a job of some description. The notion that AWS is operationally “free” is something that has irked me for quite some time now. Many people in the developer community take for granted how much work and effort there is in setting up and defining a well designed AWS account and infrastructure. This is _before_ we even start talking about a multi-account architecture - Im still in a single account here and Im already having to define infrastructure and traditional network items.
Its also worth remembering here, Ive done this quite a few times now, so I _knew_ exactly what to do. I could have used the default VPC in my account, and the pre-provided subnets, which I expect many people who are getting started might do. This took me about half an hour to get running, but I couldnt help but think here that even if I want to run lambda functions, I still need some kind of connectivity and networking. Defining NAT gateways and routing in a VPC doesnt feel very serveless at all, but it has to be done to get things moving.
### Run my damn container
Once I had the base infrastructure up and running, I now wanted to get my docker container running. I started examining the Fargate docs and browsed through the [Getting Started][3] docs and something immediately popped out at me:
> [][4]
Hold on a minute, theres at least THREE steps here just to get my container up and running? This isnt quite how this whole thing was sold to me, but lets get started.
#### Task Definitions
A task definition defines the actual container you want to run. The problem I ran into immediately here is that this thing is insanely complicated. Lots of the options here are very straightforward, like specifying the docker image and memory limits, but I also had to define a networking model and a variety of other options that I wasnt really familiar with. Really? If I had come into this process with absolutely no AWS knowledge Id be incredibly overwhelmed at this stage. A full list of the [parameters][5] can be found on the AWS page, and the list is long. I knew my container needed to have some environment variables, and it needed to expose a port. So I defined that first, with the help of a fantastic [terraform module][6] which really made this easier. If I didnt have this, Id be hand writing JSON to define my container definition.
First, I defined some environment variables:
```
container_environment_variables = [
{
name = "USER"
value = "${var.user}"
},
{
name = "PASSWORD"
value = "${var.password}"
}
]
```
Then I compiled the task definition using the module I mentioned above:
```
module "container_definition_app" {
source = "cloudposse/ecs-container-definition/aws"
version = "v0.7.0"
container_name = "${var.name}"
container_image = "${var.image}"
container_cpu = "${var.ecs_task_cpu}"
container_memory = "${var.ecs_task_memory}"
container_memory_reservation = "${var.container_memory_reservation}"
port_mappings = [
{
containerPort = "${var.app_port}"
hostPort = "${var.app_port}"
protocol = "tcp"
},
]
environment = "${local.container_environment_variables}"
}
```
I was pretty confused at this point - I need to define a lot of configuration here to get this running and Ive barely even started, but it made a little sense - anything running a docker container needs to have _some_ idea of the configuration values of the docker container. Ive [previously written][7] about the problems with Kubernetes and configuration management and the same problem seemed to be rearing its ugly head again here.
Next, I defined the task definition from the module above (which thankfully abstracted the required JSON away from me - if I had to hand write JSON at this point Ive have probably given up).
I realised immediately I was missing something as I was defining the module parameters. I need an IAM role as well! Okay, let me define that:
```
resource "aws_iam_role" "ecs_task_execution" {
name = "${var.name}-ecs_task_execution"
assume_role_policy = <<EOF
{
"Version": "2008-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "ecs-tasks.amazonaws.com"
},
"Effect": "Allow"
}
]
}
EOF
}
resource "aws_iam_role_policy_attachment" "ecs_task_execution" {
count = "${length(var.policies_arn)}"
role = "${aws_iam_role.ecs_task_execution.id}"
policy_arn = "${element(var.policies_arn, count.index)}"
}
```
That makes sense, Id need to define an RBAC policy in Kubernetes, so Im still not exactly losing or gaining anything here. I am starting to think at this point that this feels very familiar from a Kubernetes perspective.
```
resource "aws_ecs_task_definition" "app" {
family = "${var.name}"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = "${var.ecs_task_cpu}"
memory = "${var.ecs_task_memory}"
execution_role_arn = "${aws_iam_role.ecs_task_execution.arn}"
task_role_arn = "${aws_iam_role.ecs_task_execution.arn}"
container_definitions = "${module.container_definition_app.json}"
}
```
At this point, Ive written quite a few lines of code to get this running, read a lot of ECS documentation and all Ive done is define a task definition. I still havent got this thing running yet. Im really confused at this point what the value add is here over a Kubernetes based platform, but I continued onwards.
#### Services
A service is partly how to expose the container to the world, and partly how you define how many replicas it has. My first thought was “Ah! This is like a Kubernetes service!” and I set about mapping the ports and such like. Here was my first run at the terraform:
```
resource "aws_ecs_service" "app" {
name = "${var.name}"
cluster = "${module.ecs.this_ecs_cluster_id}"
task_definition = "${data.aws_ecs_task_definition.app.family}:${max(aws_ecs_task_definition.app.revision, data.aws_ecs_task_definition.app.revision)}"
desired_count = "${var.ecs_service_desired_count}"
launch_type = "FARGATE"
deployment_maximum_percent = "${var.ecs_service_deployment_maximum_percent}"
deployment_minimum_healthy_percent = "${var.ecs_service_deployment_minimum_healthy_percent}"
network_configuration {
subnets = ["${values(local.private_subnets)}"]
security_groups = ["${module.app.this_security_group_id}"]
}
}
```
I again got frustrated when I had to define the security group for this that allowed access to the ports needed, but I did so and plugged that into the network configuration. Then I got a smack in the face.
I need to define my own loadbalancer?
What?
Surely not?
##### LoadBalancers Never Go Away
I was honestly kind floored by this, Im not even sure why. Ive gotten so used to Kubernetes services and ingress objects that I completely took for granted how easy it is to get my application on the web with Kubernetes. Of course, weve spent months building a platform to make this easier at $work. Im a heavy user of [external-dns][8] and [cert-manager][9] to automate populating DNS entries on ingress objects and automating TLS certificates and I am very aware of the work needed to get these set up, but I honestly thought it would be easier to do this on Fargate. I recognise that Fargate isnt claiming to be the be all and end-all of how to run applications - its just abstracting away the node management - but I have been consistently told this is _easier_ than Kubernetes. I really was surprised. Defining a LoadBalancer (even if you dont want to use Ingresses and Ingress controllers) is part and parcel of deploying a service to Kubernetes, and I had to do the same thing again here. It just all felt so familiar.
I now realised I needed:
* A loadbalancer
* A TLS certificate
* A DNS entry
So I set about making those. I made use of some popular terraform modules, and came up with this:
```
# Define a wildcard cert for my app
module "acm" {
source = "terraform-aws-modules/acm/aws"
version = "v1.1.0"
create_certificate = true
domain_name = "${var.route53_zone_name}"
zone_id = "${data.aws_route53_zone.this.id}"
subject_alternative_names = [
"*.${var.route53_zone_name}",
]
tags = "${local.tags}"
}
# Define my loadbalancer
resource "aws_lb" "main" {
name = "${var.name}"
subnets = [ "${values(local.public_subnets)}" ]
security_groups = ["${module.alb_https_sg.this_security_group_id}", "${module.alb_http_sg.this_security_group_id}"]
}
resource "aws_lb_target_group" "main" {
name = "${var.name}"
port = "${var.app_port}"
protocol = "HTTP"
vpc_id = "${local.vpc_id}"
target_type = "ip"
depends_on = [ "aws_lb.main" ]
}
# Redirect all traffic from the ALB to the target group
resource "aws_lb_listener" "main" {
load_balancer_arn = "${aws_lb.main.id}"
port = "80"
protocol = "HTTP"
default_action {
target_group_arn = "${aws_lb_target_group.main.id}"
type = "forward"
}
}
resource "aws_lb_listener" "main-tls" {
load_balancer_arn = "${aws_lb.main.id}"
port = "443"
protocol = "HTTPS"
certificate_arn = "${module.acm.this_acm_certificate_arn}"
default_action {
target_group_arn = "${aws_lb_target_group.main.id}"
type = "forward"
}
}
```
Ill be completely honest here - I screwed this up several times. I had to fish around in the AWS console to figure out what Id done wrong. It certainly wasnt an “easy” process - and Ive done this before - many times. Honestly, at this point, Kubernetes looked positively _enticing_ to me, but I realised it was because I was very familiar with it. If I was lucky enough to be using a managed Kubernetes platform (with external-dns and cert-manager preinstalled) Id really wonder what value add I was missing from Fargate. It just really didnt feel that easy.
After a bit of back and forth, I now had a working ECS service. The final definition, including the service, looked a bit like this:
```
data "aws_ecs_task_definition" "app" {
task_definition = "${var.name}"
depends_on = ["aws_ecs_task_definition.app"]
}
resource "aws_ecs_service" "app" {
name = "${var.name}"
cluster = "${module.ecs.this_ecs_cluster_id}"
task_definition = "${data.aws_ecs_task_definition.app.family}:${max(aws_ecs_task_definition.app.revision, data.aws_ecs_task_definition.app.revision)}"
desired_count = "${var.ecs_service_desired_count}"
launch_type = "FARGATE"
deployment_maximum_percent = "${var.ecs_service_deployment_maximum_percent}"
deployment_minimum_healthy_percent = "${var.ecs_service_deployment_minimum_healthy_percent}"
network_configuration {
subnets = ["${values(local.private_subnets)}"]
security_groups = ["${module.app_sg.this_security_group_id}"]
}
load_balancer {
target_group_arn = "${aws_lb_target_group.main.id}"
container_name = "app"
container_port = "${var.app_port}"
}
depends_on = [
"aws_lb_listener.main",
]
}
```
I felt like it was close at this point, but then I remembered Id only done 2 of the required 3 steps from the original “Getting Started” document - I still needed to define the ECS cluster.
#### Clusters
Thanks to a very well [defined module][10], defining the cluster to run all this on was actually very easy.
```
module "ecs" {
source = "terraform-aws-modules/ecs/aws"
version = "v1.1.0"
name = "${var.name}"
}
```
What surprised me the _most_ here is why I had to define a cluster at all. As someone reasonably familiar with ECS it makes some sense youd need a cluster, but I tried to consider this from the point of view of someone having to go through this process as a complete newcomer - it seems surprising to me that Fargate is billed as “serverless” but you still need to define a cluster. Its a small detail, but it really stuck in my mind.
### Tell me your secrets
At this stage of the process, I was fairly happy I managed to get something running. There was however something missing from my original criteria. If we go all the way back to the task definition, youll remember my app has an environment variable for the password:
```
container_environment_variables = [
{
name = "USER"
value = "${var.user}"
},
{
name = "PASSWORD"
value = "${var.password}"
}
]
```
If I looked at my task definition in the AWS console, my password was there, staring at me in plaintext. I wanted this to end, so I set about trying to move this into something else, similar to [Kubernetes secrets][11]
#### AWS SSM
The way Fargate/ECS does the secret management portion is to use [AWS SSM][12] (the full name for this service is AWS Systems Manager Parameter Store, but I refuse to use that name because quite frankly its stupid)
The AWS documentation [covers this][13] fairly well, so I set about converting this to terraform.
##### Specifying the Secret
First, you have to define a parameter and give it a name. In terraform, it looks like this:
```
resource "aws_ssm_parameter" "app_password" {
name = "${var.app_password_param_name}" # The name of the value in AWS SSM
type = "SecureString"
value = "${var.app_password}" # The actual value of the password, like correct-horse-battery-stable
}
```
Obviously the key component here is the “SecureString” type. This uses the default AWS KMS key to encrypt the data, something that was not immediately obvious to me. This has a huge advantage over Kubernetes secrets, which arent encrypted in etcd by default.
Then I specified another local value map for ECS, and passed that as a secret parameter:
```
container_secrets = [
{
name = "PASSWORD"
valueFrom = "${var.app_password_param_name}"
},
]
module "container_definition_app" {
source = "cloudposse/ecs-container-definition/aws"
version = "v0.7.0"
container_name = "${var.name}"
container_image = "${var.image}"
container_cpu = "${var.ecs_task_cpu}"
container_memory = "${var.ecs_task_memory}"
container_memory_reservation = "${var.container_memory_reservation}"
port_mappings = [
{
containerPort = "${var.app_port}"
hostPort = "${var.app_port}"
protocol = "tcp"
},
]
environment = "${local.container_environment_variables}"
secrets = "${local.container_secrets}"
```
##### A problem arises
At this point, I redeployed my task definition, and was very confused. Why isnt the task rolling out properly? I kept seeing in the console that the running app was still using the previous task definition (version 7) when the new task definition (version 8) was available. This took me way longer than it should have to figure out, but in the events screen on the console, I noticed an IAM error. I had missed a step, and the container couldnt read the secret from AWS SSM, because it didnt have the correct IAM permissions. This was the first time I got genuinely frustrated with this whole thing. The feedback here was _terrible_ from a user experience perspective. If I hadnt known any better, I would have figured everything was fine, because there was still a task running, and my app was still available via the correct URL - I was just getting the old config.
In a Kubernetes world, I would have clearly seen an error in the pod definition. Its absolutely fantastic that Fargate makes sure my app doesnt go down, but as an operator I need some actual feedback as to whats happening. This really wasnt good enough. I genuinely hope someone from the Fargate team reads this and tries to improve this experience.
### Thats a wrap?
This was the end of the road - my app was running and Id met all my criteria. I did realise that I had some improvements to make, which included:
* Defining a cloudwatch log group, so I could write logs correctly
* Add a route53 hosted zone to make the whole thing a little easier to automate from a DNS perspective
* Fix and rescope the IAM permissions, which were very broad at this point
But honestly at this point, I wanted to reflect on the experience. I threw out a [twitter thread][14] about my experience and then spent the rest of the time thinking about what I really felt here.
### Table Stakes
What I realised, after an evening of reflection, was that this process is largely the same whether youre using Fargate or Kubernetes. What surprised me the most was that despite the regular claims Ive heard that Fargate is “easier” I really just couldnt see any benefits over a Kubernetes based platform. Now, if youre in a world where youre building Kubernetes clusters I can absolutely see the value here - managing nodes and the control plane is just overhead you dont really need. The problem is - most consumers of a Kubernetes based platform dont _have_ to do this. If youre lucky enough to be using GKE, you barely even need to think about the management of the cluster, you can run a cluster with a single gcloud command nowadays. I regularly use Digital Oceans managed Kubernetes service and I can safely say that it was as easy as spinning up a Fargate cluster - in fact in some ways it was easier.
Having to define some infrastructure to run your container is table stakes at this point. Google may have just changed the game this week with their [Google Cloud Run][15] product, but theyre massively ahead of everyone else in this field.
What I think can be safely said from this whole experience though is this: _Running containers at scale is still hard_. It requires thought, it requires domain knowledge, it requires collaboration between Operations and Developers. It also requires a foundation to build on - any AWS based operation is going to need to have some fundamental infrastructure defined and running. Im very intrigued by the “NoOps” concept that some companies seem to aspire for. I guess if youre running a stateless application, and you can put it all inside a lambda function and an API gateway youre probably in a good position, but are we really close to this in any kind of enterprise environment? I really dont think so.
#### Fair Comparisons
Another realisation that struck me is that often the comparisons between technology A and technology B sometimes arent really fair, and I see this very often with AWS. The reality of the situation is often very different from the Jeff Barr blogpost. If youre a small enough company that you can deploy your application in AWS using the AWS console and select all of the defaults, this absolutely is easier. However, I didnt want to use the defaults, because the defaults are almost always not production ready. Once you start to peel back the layers of cloud provider services, you begin to realise that at the end of the day - youre still running software. It still needs to be designed well, deployed well and operated well. I believe that the value add of AWS and Kubernetes and all the other cloud providers is it makes it much, much easier to run, design and operate things well, but it is definitely not free.
#### Arguing for Kubernetes
My final takeaway here is this: if you view Kubernetes purely as a container orchestration tool, youre probably going to love Fargate. However, as Ive become more familiar with Kubernetes, Ive come to appreciate just how important it is as a technology - not just because its a great container orchestration tool but also because of its design patterns - its declarative, API driven platform. A simple though that occurred to me during _all_ of this Fargate process was that if I deleted any of this stuff, Fargate isnt necessarily going to recreate it for me. Autoscaling is nice, not having to manage servers and patching and OS updates is awesome, but I felt Id lost so much by not being able to use Kubernetes self healing and API driven model. Sure, Kubernetes has a learning curve - but from this experience, so does Fargate.
### Summary
Despite my confusion during some of this process, I really did enjoy the experience. I still believe Fargate is a fantastic technology, and what the AWS team has done with ECS/Fargate really is nothing short of remarkable. My perspective however is that this is definitely not “easier” than Kubernetes, its just.. different.
The problems that arise when running containers in production are largely the same. If you take anything away from this post it should be this: _whichever way you choose is going to have operational overhead_. Dont fall into the trap of believing that you can just pick something and your world is going to be easier. My personal opinion is this: If you have an operations team and your company is going to be deploying containers across multiple app teams - pick a technology and build processes and tooling around it to make it easier.
Im certainly going to take the claims from people that certain technology is easier with a grain of salt from now on. At this stage, when it comes to Fargate, this sums up my feelings:
> [][16]
--------------------------------------------------------------------------------
via: https://leebriggs.co.uk/blog/2019/04/13/the-fargate-illusion.html
作者:[Lee Briggs][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://leebriggs.co.uk/
[b]: https://github.com/lujun9972
[1]: https://matthias-endler.de/2019/maybe-you-dont-need-kubernetes/
[2]: https://aws.amazon.com/fargate/
[3]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_GetStarted.html
[4]: https://imgur.com/FpU0lds
[5]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html
[6]: https://github.com/cloudposse/terraform-aws-ecs-container-definition
[7]: https://leebriggs.co.uk/blog/2018/05/08/kubernetes-config-mgmt.html
[8]: https://github.com/kubernetes-incubator/external-dns
[9]: https://github.com/jetstack/cert-manager
[10]: https://github.com/terraform-aws-modules/terraform-aws-ecs
[11]: https://kubernetes.io/docs/concepts/configuration/secret/
[12]: https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-paramstore.html
[13]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/specifying-sensitive-data.html
[14]: https://twitter.com/briggsl/status/1116870900719030272
[15]: https://cloud.google.com/run/
[16]: https://imgur.com/QfFg225

View File

@ -0,0 +1,447 @@
[#]: collector: (lujun9972)
[#]: translator: (wxy)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (The Fargate Illusion)
[#]: via: (https://leebriggs.co.uk/blog/2019/04/13/the-fargate-illusion.html)
[#]: author: (Lee Briggs https://leebriggs.co.uk/)
Fargate 幻觉
======
我在 $work 工作的近一年的时间里建立了一个基于 Kubernetes 的平台,而且成为了一个 Kubernetes 的辩护人。这是真的,我认为这项技术太棒了。然而我没有真正想过它的操作和维护究竟有多困难。我在今年早些时候阅读了[这样][1]的一篇文章,并对其中某些意见深以为然。如果我在一家规模较小的、有 10 到 15 个工程师的公司,假如有人建议管理和维护一批 Kubernetes 集群,我会深感震惊的。因为它的运营开销太高了!
尽管我现在对 Kubernetes 的一切都很感兴趣,但我仍然对“<ruby>无服务器<rt>Serverless</rt></ruby>”计算会消灭运维工程师的说法抱有疑问。这种奇谈怪论主要来源于希望在未来仍然能有一份有收益的工作 —— 如果我们前景光明的未来不需要运维工程师,我觉得也没什么大惊小怪的。我已经在 Lamdba 和Google Cloud Functions 上做了一些实验,结果让我印象十分深刻,但我仍然坚信无服务器解决方案只是解决了一部分问题。
我已经关注 [AWS Fargate][2] 已经有一段时间了,这是就是 $work 的开发人员目为“无服务器计算”的东西 —— 主要是因为使用了 Fargate你就可以运行你的 Docker 容器而不需要管理底层节点。我想看看它到底意味着什么 —— 所以我开始尝试从头开始在 Fargate 上运行一个应用程序。我定义的成功标准是与“生产级”应用程序紧密相关的某些东西,所以我希望得到以下内容:
* 一个在 Fargate 上运行的容器
* 配置以环境变量的形式下推
* “隐秘信息” 不能是明文的
* 位于负载均衡器之后
* SSL 证书有效的 TLS 通道
我从基础设施即代码的方式开始整个任务,不遵循默认的 AWS 控制台向导,而是使用 terraform 来定义基础架构。这很可能让整个事情变得很复杂,但我想确保任何部署对于任何想要按此步骤复现的人都是可重复的和可发现的。
所有上述标准通常都可以通过基于 Kubernetes 的平台使用一些外部附加组件和插件来实现,所以我确实是以一种比较的心态来处理整个任务 —— 因为我要将它与我的常用工作流程进行比较。我的主要目标是看看Fargate 有多容易,特别是与 Kubernetes 相比时。结果让我感到非常惊讶。
### AWS 是有开销的
我有一个干净的 AWS 账户,并决定从零到部署一个 webapp。与 AWS 中的其它基础设施一样,我必须使基本的基础设施正常工作 - 因此我首先必须定义 VPC。
遵循最佳实践,因此我将 VPC 划分为可用区域内的子网具有公共子网和私有子网。在这一点上我想到只要这种需求存在我就能找到一份这种工作。AWS 在运维上“免费”这一概念一直让我感到厌倦。开发者社区中的许多人理所当然地认为在设置和定义设计良好的 AWS 账户和基础设施方面不需要多少工作和努力。在我们甚至开始谈论多帐户架构*之前*(现在我仍然使用单一帐户),我必须已经定义好基础设施和传统的网络设备。
这里也值得记住,我已经做了很多次,所以我*知道*该做什么。我可以在我的帐户中使用默认的 VPC 以及预先提供的子网,我觉得很多人也可以使用它。这花了我大约半个小时才能运行,但我不禁想到,即使我想运行 lambda 函数,我仍然需要某种连接和网络。在 VPC 中定义 NAT 网关和路由根本不会让你觉得“无服务器”,但要往下进行这就是必须要做的。
### 运行个简单的容器
我启动运行了基本的基础设施之后,我想让我的 Docker 容器运行起来。 我开始翻阅 Fargate 文档并浏览 [入门][3] 文档,这些就立即突然出现在了我面前:
![][4]
等一下,只是让我的容器运行就至少要有**三个**步骤?这完全不像我所想的,不过还是让我们开始吧。
#### 任务定义
<ruby>任务定义<rt>Task Definition<rt></ruby>”用来定义要运行的实际容器。我在这里遇到的问题是,任务定义这件事非常复杂。这里有很多选项是非常简单的,比如指定 Docker 镜像和内存限制,但我还必须定义一个网络模型以及我并不熟悉的各种其他选项。真需要这样吗?如果我完全没有 AWS 方面的知识就进入到这个过程里,那么在这个阶段我会感觉非常的不知所措。可以在 AWS 页面上找到这些 [参数][5] 的完整列表,这个列表很长。我知道我的容器需要有一些环境变量,它需要暴露一个端口。所以我首先在一个神奇的 [terraform 模块][6] 的帮助下定义了这一点,这真的让这件事更容易。如果我没有这个模块,我会亲自编写 JSON 来定义我的容器定义。
首先我定义了一些环境变量:
```
container_environment_variables = [
{
name = "USER"
value = "${var.user}"
},
{
name = "PASSWORD"
value = "${var.password}"
}
]
```
然后我使用上面提及的模块组成了任务定义:
```
module "container_definition_app" {
source = "cloudposse/ecs-container-definition/aws"
version = "v0.7.0"
container_name = "${var.name}"
container_image = "${var.image}"
container_cpu = "${var.ecs_task_cpu}"
container_memory = "${var.ecs_task_memory}"
container_memory_reservation = "${var.container_memory_reservation}"
port_mappings = [
{
containerPort = "${var.app_port}"
hostPort = "${var.app_port}"
protocol = "tcp"
},
]
environment = "${local.container_environment_variables}"
}
```
在这一点上我非常困惑 —— 我需要在这里定义很多配置以使其运行,而这时什么都没有开始呢,但这是必要的 —— 运行 Docker 容器肯定需要了解一些容器配置的知识。我 [之前写过][7] 关于 Kubernetes 和配置管理的问题的文章,同样的问题似乎在这里再次抬头。
接下来,我从上面的模块中定义了任务定义(幸好从我这里抽象出了所需的 JSON —— 如果我不得不手写JSON我可能已经放弃了
当我定义模块参数时,我突然意识到我错过了一些东西。我也需要一个 IAM 角色!好吧,让我来定义:
```
resource "aws_iam_role" "ecs_task_execution" {
name = "${var.name}-ecs_task_execution"
assume_role_policy = <<EOF
{
"Version": "2008-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "ecs-tasks.amazonaws.com"
},
"Effect": "Allow"
}
]
}
EOF
}
resource "aws_iam_role_policy_attachment" "ecs_task_execution" {
count = "${length(var.policies_arn)}"
role = "${aws_iam_role.ecs_task_execution.id}"
policy_arn = "${element(var.policies_arn, count.index)}"
}
```
这样做是有意义的,我需要在 Kubernetes 中定义一个 RBAC 策略,所以我仍然没有完全错失或获得任何东西。在这一点上,我开始觉得从 Kubernetes 的角度来看,这种感觉非常熟悉。
```
resource "aws_ecs_task_definition" "app" {
family = "${var.name}"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = "${var.ecs_task_cpu}"
memory = "${var.ecs_task_memory}"
execution_role_arn = "${aws_iam_role.ecs_task_execution.arn}"
task_role_arn = "${aws_iam_role.ecs_task_execution.arn}"
container_definitions = "${module.container_definition_app.json}"
}
```
在这里,我写了很多行代码以运行起来,我阅读了很多 ECS 文档,我所做的就是定义一个任务定义。我还没有让这个东西运行起来。在这一点上,我真的很困惑这个基于 Kubernetes 的平台增加了什么,但我继续前行。
#### 服务
服务,一部分是如何将容器暴露给外部,另一部分是如何定义它拥有的副本数量。我的第一个想法是“啊!这就像一个 Kubernetes 服务!”我开始着手映射端口等。这是我第一次在 terraform 上跑:
```
resource "aws_ecs_service" "app" {
name = "${var.name}"
cluster = "${module.ecs.this_ecs_cluster_id}"
task_definition = "${data.aws_ecs_task_definition.app.family}:${max(aws_ecs_task_definition.app.revision, data.aws_ecs_task_definition.app.revision)}"
desired_count = "${var.ecs_service_desired_count}"
launch_type = "FARGATE"
deployment_maximum_percent = "${var.ecs_service_deployment_maximum_percent}"
deployment_minimum_healthy_percent = "${var.ecs_service_deployment_minimum_healthy_percent}"
network_configuration {
subnets = ["${values(local.private_subnets)}"]
security_groups = ["${module.app.this_security_group_id}"]
}
}
```
当我必须为此定义允许访问所需端口的安全组时,我再次感到沮丧,但我这样做了并将其插入到网络配置中后,我就像被扇了一巴掌。
我需要定义自己的负载均衡器?
什么?
当然不是吗?
##### 负载均衡器从未远离
老实说,我很满意,我甚至不确定为什么。我已经习惯了 Kubernetes 的服务和 Ingress 对象,我完全认为用 Kubernetes 将我的应用程序放到网上是多么容易。当然,我们在 $work 花了几个月的时间建立一个平台,以便更轻松。我是 [external-dns][8] 和 [cert-manager][9] 的重度用户,它们可以自动填充 Ingress 对象上的 DNS 条目并自动化 TLS 证书,我非常了解进行这些设置所需的工作,但老实说,我认为在 Fargate 上做这件事会更容易。我认识到 Fargate 没有声称自己是如何运行应用程序的全部和最终目的 —— 它只是抽象出节点管理 —— 但我一直被告知这比 Kubernetes *更加容易*。我真的很惊讶。定义负载均衡器(即使你不想使用 Ingress 和 Ingress 控制器)也是向 Kubernetes 部署服务的重要组成部分,我不得不在这里再次做同样的事情。这一切都让人觉得如此熟悉。
我现在意识到我需要:
* 一个负载均衡器
* 一个 TLS 证书
* 一个 DNS 名字
所以我着手做了这些。我使用了一些流行的 terraform 模块,并想出了这个:
```
# Define a wildcard cert for my app
module "acm" {
source = "terraform-aws-modules/acm/aws"
version = "v1.1.0"
create_certificate = true
domain_name = "${var.route53_zone_name}"
zone_id = "${data.aws_route53_zone.this.id}"
subject_alternative_names = [
"*.${var.route53_zone_name}",
]
tags = "${local.tags}"
}
# Define my loadbalancer
resource "aws_lb" "main" {
name = "${var.name}"
subnets = [ "${values(local.public_subnets)}" ]
security_groups = ["${module.alb_https_sg.this_security_group_id}", "${module.alb_http_sg.this_security_group_id}"]
}
resource "aws_lb_target_group" "main" {
name = "${var.name}"
port = "${var.app_port}"
protocol = "HTTP"
vpc_id = "${local.vpc_id}"
target_type = "ip"
depends_on = [ "aws_lb.main" ]
}
# Redirect all traffic from the ALB to the target group
resource "aws_lb_listener" "main" {
load_balancer_arn = "${aws_lb.main.id}"
port = "80"
protocol = "HTTP"
default_action {
target_group_arn = "${aws_lb_target_group.main.id}"
type = "forward"
}
}
resource "aws_lb_listener" "main-tls" {
load_balancer_arn = "${aws_lb.main.id}"
port = "443"
protocol = "HTTPS"
certificate_arn = "${module.acm.this_acm_certificate_arn}"
default_action {
target_group_arn = "${aws_lb_target_group.main.id}"
type = "forward"
}
}
```
我必须承认,在这里我搞砸了好几次。我不得不在 AWS 控制台中四处翻弄以弄清楚我做错了什么。这当然不是一个“轻松”的过程而且我之前已经做过很多次了。老实说在这一点上Kubernetes 看起来对我很有启发性,但我意识到这是因为我对它非常熟悉。幸运的是我能够使用托管的 Kubernetes 平台(预装了 external-dns 和 cert-manager我真的很想知道 Fargate 缺少了什么。它真的感觉不那么简单。
经过一番折腾,我现在有一个可以工作的 ECS 服务。包括服务在内的最终定义看起来有点像这样:
```
data "aws_ecs_task_definition" "app" {
task_definition = "${var.name}"
depends_on = ["aws_ecs_task_definition.app"]
}
resource "aws_ecs_service" "app" {
name = "${var.name}"
cluster = "${module.ecs.this_ecs_cluster_id}"
task_definition = "${data.aws_ecs_task_definition.app.family}:${max(aws_ecs_task_definition.app.revision, data.aws_ecs_task_definition.app.revision)}"
desired_count = "${var.ecs_service_desired_count}"
launch_type = "FARGATE"
deployment_maximum_percent = "${var.ecs_service_deployment_maximum_percent}"
deployment_minimum_healthy_percent = "${var.ecs_service_deployment_minimum_healthy_percent}"
network_configuration {
subnets = ["${values(local.private_subnets)}"]
security_groups = ["${module.app_sg.this_security_group_id}"]
}
load_balancer {
target_group_arn = "${aws_lb_target_group.main.id}"
container_name = "app"
container_port = "${var.app_port}"
}
depends_on = [
"aws_lb_listener.main",
]
}
```
我觉得我已经接近完成了,但后来我记起了我只完成了最初的“入门”文档中所需的 3 个步骤中的 2 个 —— 我仍然需要定义 ECS 群集。
#### 集群
感谢 [定义模块][10],定义要运行所有这些的集群实际上非常简单。
```
module "ecs" {
source = "terraform-aws-modules/ecs/aws"
version = "v1.1.0"
name = "${var.name}"
}
```
这里让我感到惊讶的是为什么我必须完全定义一个集群。作为一个熟悉 ECS 的人,你会觉得你需要一个集群,但我试图从一个必须经历这个过程的新人的角度来考虑这一点 —— 对我来说Fargate 标榜自己“
无服务器”而你仍需要定义集群,这似乎很令人惊讶。这是一个小细节,但它确实盘旋在我的脑海里。
### 告诉我你的秘密
在这个阶段,我很高兴我成功地运行了一些东西。然而,我的原始标准缺少一些东西。如果我们回到任务定义那里,你会记得我的应用程序有一个存放密码的环境变量:
```
container_environment_variables = [
{
name = "USER"
value = "${var.user}"
},
{
name = "PASSWORD"
value = "${var.password}"
}
]
```
如果我在 AWS 控制台中查看我的任务定义,我的密码就在那里,明晃晃的明文。我希望不要这样,所以我开始尝试将其转化为其他东西,类似于 [Kubernetes secrets][11]。
#### AWS SSM
Fargate / ECS 执行<ruby>秘密管理<rt>secret management</rt></ruby>部分的方式是使用 [AWS SSM][12](此服务的全名是 AWS 系统管理器参数存储库,但我不想使用这个名称,因为坦率地说这个名字太愚蠢了)。
AWS 文档很好的[涵盖了这个内容][13],因此我开始将其转换为 terraform。
##### 指定秘密信息
首先,你必须定义一个参数并为其命名。在 terraform 中,它看起来像这样:
```
resource "aws_ssm_parameter" "app_password" {
name = "${var.app_password_param_name}" # The name of the value in AWS SSM
type = "SecureString"
value = "${var.app_password}" # The actual value of the password, like correct-horse-battery-stable
}
```
显然,这里的关键组件是 “SecureString” 类型。这会使用默认的 AWS KMS 密钥来加密数据,这对我来说并不是很直观。这比 Kubernetes 秘密具有巨大优势,默认情况下,这些秘密信息在 etcd 中是不加密的。
然后我为 ECS 指定了另一个本地值映射,并将其作为秘密参数传递:
```
container_secrets = [
{
name = "PASSWORD"
valueFrom = "${var.app_password_param_name}"
},
]
module "container_definition_app" {
source = "cloudposse/ecs-container-definition/aws"
version = "v0.7.0"
container_name = "${var.name}"
container_image = "${var.image}"
container_cpu = "${var.ecs_task_cpu}"
container_memory = "${var.ecs_task_memory}"
container_memory_reservation = "${var.container_memory_reservation}"
port_mappings = [
{
containerPort = "${var.app_port}"
hostPort = "${var.app_port}"
protocol = "tcp"
},
]
environment = "${local.container_environment_variables}"
secrets = "${local.container_secrets}"
```
##### 出了个问题
此时,我重新部署了我的任务定义,并且非常困惑。为什么任务没有正确拉起?当新的任务定义(版本 8可用时我一直在控制台中看到正在运行的应用程序仍在使用先前的任务定义版本 7。这件事花费的时间比我预期的要长但是在控制台的事件屏幕上我注意到了 IAM 错误。我错过了一个步骤,容器无法从 AWS SSM 中读取秘密信息,因为它没有正确的 IAM 权限。这是我第一次真正对整个这件事情感到沮丧。从用户体验的角度来看,这里的反馈非常*糟糕*。如果我没有发觉的话,我会认为一切都很好,因为仍然有一个任务正在运行,我的应用程序仍然可以通过正确的 URL 访问 —— 只不过是旧的配置而已。
在 Kubernetes 里,我会清楚地看到 pod 定义中的错误。Fargate 可以确保我的应用不会停止,这绝对是太棒了,但作为一名运维,我需要一些关于发生了什么的实际反馈。这真的不够好。我真的希望 Fargate 团队的人能够读到这篇文章,改善这种体验。
### 就这样了
到这里就结束了 —— 我的应用程序正在运行,也符合我的所有标准。我确实意识到我做了一些改进,其中包括:
* 定义一个 cloudwatch 日志组,这样我就可以正确地写日志了
* 添加了一个 route53 托管区域,使整个事情从 DNS 角度更容易自动化
* 修复并重新调整了 IAM 权限,这里太宽泛了
但老实说,在这一点上我想反思一下这段经历。我写了一个关于我的经历的 [推特会话][14],然后花了其余时间思考我在这里真正感受到的。
### 代价
经过一夜的反思,我意识到无论你是使用 Fargate 还是 Kubernetes这个过程都大致相同。最让我感到惊讶的是尽管我经常声称 Fargate “更容易”,但我真的没有看到任何超过 Kubernetes 平台的好处。现在,如果你正在构建 Kubernetes 集群,我绝对可以看到这里的价值 —— 管理节点和控制面板只是不必要的开销。问题是 —— 基于 Kubernetes 的平台的大多数消费者都*没有*这样做。如果你很幸运能够使用 GKE你几乎不需要考虑集群的管理你可以使用单个 gcloud 命令来运行集群。我经常使用 Digital Ocean 的 Kubernetes 治理服务,我可以肯定地说它就像操作 Fargate 集群一样简单 —— 实际上在某种程度上它更容易。
必须定义一些基础设施来运行你的容器就是此时的代价。谷歌本周可能刚刚使用他们的 [Google Cloud Run][15] 产品改变了游戏规则,但他们在这一领域的领先优势远远领先于其他所有人。
从这整个经历中,我可以肯定的说:*大规模运行容器仍然很难。*它需要思考,需要领域知识,需要运维和开发人员之间的协作。它还需要一个基础来构建 —— 任何基于 AWS 的操作都需要事先定义和运行一些基础架构。我对一些公司似乎渴望的 “NoOps” 概念非常感兴趣。我想如果你正在运行一个无状态应用程序,你可以把它全部放在一个 lambda 函数和一个 API 网关中,这可能不错,但我们是否真的适合在任何一种企业环境中这样做?我真的不这么认为。
#### 公平比较
令我印象深刻的另一个现实是,技术 A 和技术 B 之间的比较通常不太公平,我经常在 AWS 上看到这一点。这种实际情况往往与 Jeff Barr 博客文章截然不同。如果你是一家足够小的公司,你可以使用 AWS 控制台在 AWS 中部署你的应用程序并接受所有默认值,这绝对更容易。但是,我不想使用默认值,因为默认值几乎是不适用于生产环境的。一旦你开始剥离掉云服务商服务的层面,你就会开始意识到最终你仍然是在运行软件。它仍然需要设计良好、部署良好、运行良好。我相信 AWS 和 Kubernetes 以及所有其他云服务商的增值服务使得它更容易运行、设计和操作,但它绝对不是免费的。
#### Kubernetes 的争议
最后就是:如果你将 Kubernetes 纯粹视为一个容器编排工具,你可能会喜欢 Fargate。然而随着我对 Kubernetes 越来越熟悉,我开始意识到它作为一种技术的重要性 - 不仅因为它是一个伟大的容器编排工具,而且因为它的设计模式 - 它是声明性的、API 驱动的平台。 在*整个* Fargate 过程期间发生的一个简单的事情是如果我删除这里某个东西Fargate 不一定会为我重新创建它。自动缩放很不错,不需要管理服务器和操作系统的补丁及更新很棒,但我觉得因为无法使用 Kubernetes 自我修复和 API 驱动模型而失去了很多。当然Kubernetes 有一个学习曲线 - 但从这里的体验来看Fargate 也是如此。
### 总结
尽管我在这个过程中遭遇了困惑,但我确实很喜欢这种体验。我仍然相信 Fargate 是一项出色的技术AWS 团队对 ECS/Fargate 所做的工作确实非常出色。然而,我的观点是,这绝对不比 Kubernetes “更容易”,只是……难点不同。
在生产环境中运行容器时出现的问题大致相同。如果你从这篇文章中有所收获,它应该是这样的:*不管你选择的哪种方式都有运维开销*。不要相信你选择一些东西你的世界就变得更轻松。我个人的意见是:如果你有一个运维团队,而你的公司将为多个应用程序团队部署容器 —— 选择一种技术并围绕它构建流程和工具以使其更容易。
人们说的一点肯定是没错,某种技术肯定比现在更容易一些。在这个阶段,谈到 Fargate下面的漫画这总结了我的感受
![][16]
--------------------------------------------------------------------------------
via: https://leebriggs.co.uk/blog/2019/04/13/the-fargate-illusion.html
作者:[Lee Briggs][a]
选题:[lujun9972][b]
译者:[wxy](https://github.com/wxy)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://leebriggs.co.uk/
[b]: https://github.com/lujun9972
[1]: https://matthias-endler.de/2019/maybe-you-dont-need-kubernetes/
[2]: https://aws.amazon.com/fargate/
[3]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_GetStarted.html
[4]: https://i.imgur.com/YfMyXBdl.png
[5]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html
[6]: https://github.com/cloudposse/terraform-aws-ecs-container-definition
[7]: https://leebriggs.co.uk/blog/2018/05/08/kubernetes-config-mgmt.html
[8]: https://github.com/kubernetes-incubator/external-dns
[9]: https://github.com/jetstack/cert-manager
[10]: https://github.com/terraform-aws-modules/terraform-aws-ecs
[11]: https://kubernetes.io/docs/concepts/configuration/secret/
[12]: https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-paramstore.html
[13]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/specifying-sensitive-data.html
[14]: https://twitter.com/briggsl/status/1116870900719030272
[15]: https://cloud.google.com/run/
[16]: https://i.imgur.com/Bx7Q50Jl.jpg