3 years on Google App Engine. An Epic Review.

For the last 3 years I worked on an application that runs on Google App Engine. It is a fascinating, unique piece of service Google is offering here. Unlike anything you’ll find elsewhere. This is my in-depth, personal take on it.

Google’s Cloud (est. 2008)

首先,是什么Google App Engine(GAE) actually? It is a platform to run your web applications on. Likeheroku.。But different when you look closer. It is also a versatile cloud computing platform. LikeAWS。But different. Let me explain.

谷歌在2008年推出GAE,当云计算was still in its infancy. Amazon was ahead of them since they already started renting out their IT infrastructure in 2006. But with GAE, Google offered a sophisticated Platform-as-a-Service (PaaS) very early on that would be matched by Amazon with its Elastic Beanstalk service in 2011. Now what is so special about GAE?

It is a全管理application platform. So far, I do not know a platform which comes close to GAE’s full package: log management, mail delivery, scaling, memcache, image manipulation, distributed Cron jobs, load balancing, version management, task queue, search, performance analysis, cloud debugging, content delivery network - and that is not even mentioning auxiliary services that have popped up on Google’s cloud in the meantime like SQL, BigQuery, file storage… the list goes on.

By using Google App Engine, you can run your app on top of (probably) the world’s best infrastructure. Also, you receive functionality out of the box that would take at least a dozen add-ons from third parties on Heroku or a few weeks of setup if done on your own.Thisis GAE’s appeal.

在GAE上运行的值得注意的应用程序包括SnapchatandKhan Academy

Development

The web app I was working on all this time is a single, large Java application. App Engine also supports Python, PHP and Go. Now you might wonder why the selection is so limited. One reason is that in order to have a fully-managed environment, Google needs to integrate the platform with the environment. You could say that environment and platform are tightly coupled. That takes a lot of effort and investment which becomes very clear once you start developing for GAE.

SDK.

每个应用程序都需要用一个特殊的SDK(软件产品开发lopment Kit) to use the APIs offered by GAE. The SDK is huge. For example, the Java SDK download comes in at roughly 190 MB. Granted, some of the JARs in there are not needed for most use cases and some only during development - but still, it certainly is not lightweight (even for Java, that is).

The SDK is not just your bridge to the world of Google App Engine but also serves as its simulation on your local machine. For virtually every GAE API it features a stub that you can develop against. First of all, this means that when you run your app locally you’ll getquite接近它在生产中的行为。其次,您可以轻松地对API写入集成测试。通常这会让你很远;生产和存根行为之间的不匹配非常小。

Java APIs

说到API,当您使用某些Java API时,您就会出现惊喜。因为GAE以某种方式运行您的应用程序sandbox,它禁止使用特定的Java api。主要的restrictions include writing to the file system, certain methods ofjava.lang.Systemand using the Java Native Interface (JNI). There are also peculiarities about using threads and sockets but more on that later.

一个有趣的是,Java SDK实际上确保了您在本地不使用这些受限API。运行应用程序或仅限于集成测试时,它会采用Java代理,监控您的每个方法调用。它立即抛出任何检测到的违规行为的例外。这有助于尽早找到违规行为,而不仅在生产中,而且具有令人讨厌的副作用。配置应用程序的性能时,代理将有一个压倒性的违规支票。最后,很难判断您的应用程序的实际性能,因为您所做的方法调用越多,代理生成的开销就越多。

Java开发套件(JDK)

当你开始开发时,你可能会注意到的下一件事就是你可以notuse Java 8. Even though Java 7’s end of life was in 2015, it is still very much alive and kicking on GAE. The third highest voted issue onGAE的问题跟踪器issupport for Java 8(第二个最高的是支持Python 3)。was created in 2013. Since then, the only shred of news about any progress on the matter is a post on the App Engine mailing list from 2016, stating engineers are actively working on it. Well, good for you.

Obviously, this limitation is a major annoyance for any developer. For me personally, the missing lambda support weighs very heavily. Of course, one could migrate to one of the many JVM languages like Groovy, Scala or Kotlin which all offer a lot more features than Java 8. But this is a costly and risky investment to make. Too costly and risky for our project. We also investigated the feasibility ofretrolambda那a backport of lambdas to Java 7, but did not pursue it yet although it looked promising in first tests.

Having to stay with an old version is also a liability for the business. It makes it harder to find developers. Overall application security is threatened, as well. Google support told us, we would still receive security patches for our production JDK 7. But eventually, all major libraries like Spring will stop supporting it. Eventually, you’ll bestuck

部署

To deploy your application, you need to create anappengine-web.xmlConfiguration file. There, you specify the application ID and version plus some additional settings, e.g. marking the app asthreadsafeto be able to receive multiple requests per instance simultaneously.

Upload

App Engine expects to receive your Java application as a packaged WAR file. You can upload it to their servers with theappcfgscript from the SDK. Optionally, there are plugins for Maven and Gradle which make this as easy as writingMVN AppEngine:更新。The upload can takequite一段时间用于典型的Java应用程序,您最好有一个快速的互联网连接。完成过程完成后,您可以在Google Cloud Console中看到您的新部署版本:

Google Cloud Console - Versions

Static Files

像图像,样式表和脚本等静态文件是今天任何Web应用程序的一部分。在里面appengine-web.xmlfiles can be marked as static. Google will serve these files directly - without hitting your application. It is notexactlya Content Delivery Network (CDN) since it is not distributed to hundreds of edge nodes, but it helps to reduce the load on your servers.

Versions

The nice thing in App Engine is that everything you deploy has a specific version. Every version can be accessed athttps:// -dot- .appspot.com。但哪一个是actuallylive?

You can mark a version asdefault。This means when you go tohttps://.appspot.com(or the domain name you specified for the app), that will be the version receiving all the requests. Switching a version todefaultis very easy: all it takes is a button click or simple terminal command. GAE can switch immediately or migrate your traffic incrementally to prevent overwhelming the new version.

There is also one option (which we never used) that allows you to distribute your traffic across multiple versions. This allows incrementally rolling out a new version by only giving it to a fraction of the user base before making it available for everyone.

Since it is so easy to create new versions and switch production traffic between them, GAE is a perfect platform to practiceblue-green deployment。Each time we had the need to rollback due to a bug in the new version, it was effortless. Continuous Delivery should also be achievable by writing a somewhat smart deployment script.

实例

Every version can run any number of instances (the only limit is your credit card). The actual number is the result of incoming traffic and the scaling configuration of your app; we’ll look at that later. Google will distribute incoming requests between all running instances of that version. You can see a list of instances, including some basic metrics like requests and latency, in the Google Cloud Console:

谷歌云控制台 - 实例

您可以选择的硬件选项是运行这些实例是 - 让我们在这里坦率 - 可怜的。App Engine基本上提供四种不同的instance classesranging from 128MB and 600MHz CPU (you read that correctly) to 1024MB and 2.4GHz CPU. Yes, again, that is true. And truly sad. On a developer’s laptop our app started almost twice as fast as in production.

Services

到目前为止,我只讨论了一个单片申请。但是,如果你的是多个服务组成了什么?App Engine让您覆盖。每个应用程序都是服务。如果你只有一个,它只是被称为default。You can access each one directly viahttps://-dot--dot-.appspot.com

App Engine Application, Service, Version and Instance

您可以轻松地部署每个服务的多个版本,缩放和单独监控它们。由于每个服务与其他服务分开,因此您可以运行支持的语言的任何组合。不幸的是,在所有服务中共享某些配置设置。因此,它们不完美孤立。仍然,总而言之,GAE似乎很适合微服务。有一些详细说明文件关于谷歌的这个主题。

For reasons that will become clear later, we decided to separate our application into two services: frontend (user-facing) and backend (background work). But to do so, we didn’t actually split the monolith in two - that would have taken months. We simply deployed the same app twice and only sent users to one service and background work to the other.

Operations

Let’s talk about what it means torun您在App Engine上的应用程序。正如您将看到的,您对您产生了许多限制。但它并不是忧郁。最后,你会理解为什么。

Application Startup

当App Engine启动新实例时,应用程序需要初始化。它将直接从用户发送到应用程序的HTTP请求,或者 - 如果配置和缩放情况允许它 - 发送所谓的预热请求。无论哪种方式,第一个请求都称为加载请求。正如您可以想象的那样,开始快速很重要。

另一方面,实例本身很快就开始了。如果您之前已启动过云中的服务器,则可能已等待超过一分钟。不在gae上。实例几乎开始立即开始。我猜谷歌拿着一款准备好的服务器池。瓶颈永远是您自己的应用程序。我们的应用程序在生产中占用了40多秒。因此,除非我们想要将我们的巨大单线分成独立的服务,否则我们需要它更有效地开始。

The app uses Spring. Google even has a dedicated documentation entry just for that:Optimizing Spring Framework for App Engine Applications。There we found the inspiration for our most important startup optimization.

We got rid of Spring’s classpath scanning. It is particularly slow on App Engine (probably due to the abysmal CPU). Luckily, there is a library calledClassindex。It writes the fully qualified path of classes with a special annotation to a text file. By simply reading the beans from the text file, the Spring initialization went down by about 8-10 seconds.

Request Handling

The very first thing I have to mention here is the requirement of the App Engine to handle a user request within 60 seconds and a background request in 10 minutes. When the application takes too long to respond, the request is aborted with a 500 status code and adeadlineexceedededException.被抛出。

Usually, this shouldn’t be a problem. If your app takes more than 60 seconds to respond, odds are the user is long gone anyway. But since an instance is started via an HTTP request, this also means it has to start in 60 seconds. In production, we observed variations in startup time of up to 10 seconds. This means you now have less than 50 seconds to start your app. It is not uncommon for a Java app to take that long.

我想突出的一个很好的小功能是地理HTTP标题:对于每个来电,对于每个来电用户请求,Google添加了包含用户国家,区域,城市以及所述城市的纬度和经度的标题。这可能是veryuseful, for example for pre-filling phone number country codes or detecting unusual account login locations. The accuracy also seems pretty high from our observations. It is usually very cumbersome and/or expensive to get that kind of information with this level of accuracy from a third party API or database. So getting it for free on App Engine is a nice bonus.

Background Work

Threads

如前所述,使用Java线程存在限制。虽然可以启动一个新的线程,尽管通过自定义gaeThreadManager那it cannot ‘outlive’ the request it was created in. This can be annoying in practice since third party libraries don’t follow App Engine’s restrictions, of course. To find a compatible library or adapt a seemingly incompatible one, cost us a lot of sweat and tears over the years. For example, we could not use the删除指标library out of the box since it relies on using a background thread.

队列

But there are other ways of doing background work: In the spirit of the Cloud, you apply the divide and conquer approach on the instance level. By usingtask queuesyou can enqueue work for later processing. For example, when an email needs to be sent, you can enqueue a new task with a payload (e.g. recipient, subject and body) and a URL on apush队列。然后,您的一个实例将收到有效载荷作为对指定端点的HTTP POST请求。如果失败,App Engine将重试该操作。

This pattern really shines when you have a lot of work to process. Simply enqueue a batch of tasks that run in isolation. The App Engine will take care of failure handling. No need for custom retry code. Just imagine how awkward it would be without it: running hundreds of tasks at once you either need to stop and start from scratch when an error occurs or carefully track which have failed and enqueue them again for another attempt.

And just like the rest of the App Engine, task queues scale beautifully. A queue can receive virtually unlimited tasks. The downside is the payload can only be up to 1 MB, though. But what we usually did was to simply pass references to data to the queue. But then, you need to take extra good care in your data handling since it can easily happen that something vanishes between the time you enqueue a task and the time that task is actually executed.

The queues are configured in a队列。xmlfile. Here is an example of a push queue that fires up to one task per second with a maximum of two retries:

我的推队队列<率>1/s2

Cron

另一个非常有价值的工具是分布式Cron。在一个Cron.xmlyou can tell GAE to issue requests at certain time intervals. These are just simple HTTP GET requests one of your instances will receive. The smallest interval possible is once per minute. It is very useful for regular reports, emails and cleanups.

This is what an entry inCron.xmllooks like:

/tasks/summary<时间表>every 24 hours

A Cron job can also be combined withpullqueues: they allow to actively fetch a batch of tasks from a queue. Depending on the use case, making an instance pull lots of tasks in a batch can be much more efficient than pushing them to the instance individually.

喜欢all other App Engine configuration files, theCron.xmlis shared across all services and versions of an application. This can be annoying. In our case, sometimes when we deployed a version where a new Cron entry had been added, App Engine would start sending requests to an endpoint which did not exist on the live (but older) version - generating noise for our production error reporting. I imagine this must be even more painful when using App Engine to host microservices.

此外,Cron作业不是本地运行。我可以和erstand why that might be: a lot of the jobs are usually scheduled outside the usually busy time and would therefore not even be triggered during a regular workday. But一些run like every few minutes or hours - and those are really interesting to observe. They might trigger notifications, for example. You want to see those locally. Because eventually you will introduce a change that leads to undesirable behavior (as has happened multiple times in our project) and seeing it locally might prevent you from shipping it. But simulating the Cron jobs locally is tricky (we didn’t bother, unfortunately). One would probably need to write an external tool that parses theCron.xml然后引起终点点(哎呀!)。

Scaling

App Engine将根据流量处理实例数量。如何?好吧,具体取决于您如何配置应用程序。有三种模式:

The most useful and interesting one here certainly is theautomatic mode。It has a few parameters that help to shed some light on how it works internally:max_concurrent_requestsmax_idle_instances.min_idle_instancesandmax_pending_latency。To quote the App Engine documentation:

The App Engine scheduler decides whether to serve each new request with an existing instance (either one that is idle or accepts concurrent requests), put the request in a pending request queue, or start a new instance for that request. The decision takes into account the number of available instances, how quickly your application has been serving requests (its latency), and how long it takes to spin up a new instance.

Every time we tried to tweak those numbers, it felt like practicing black magic. It is very difficult to actually deduce a good setup here. Yet, these numbers determine the real-world performance of your app and hugely affect your monthly bill.

But all in all, the automatic scaling is pretty wicked. It is an especially good fit for handling background work (e.g. generating reports, sending emails) since it often - more so than user requests - comes in large, sudden bursts.

但是,由于启动时间慢,Java是一种可怕的适合这种自动缩放。更糟糕的是,调度程序为a的要求分配一个要求是什么starting(冷)实例。然后,所有进入次休息响应的努力都出现了窗外。自2012年以来,有一个问题user-facing requests never to be locked to cold instances。It has not even elicited the slightest comment by Google other than the status change to ‘Accepted’ (sounds like one of the stages of grief at this point).

This also explains why we split our app into two services. Before, we often found that with a surge in background requests, the user requests would suffer. This is because App Engine scaled the instances up immensely and, since requests are routed evenly across instances, this led to more user requests hitting cold instances. By splitting the app we significantly reduced this from happening. Also, we were able to apply different scaling strategies for the two services.

最后一件事:在认为,我使用美联社p Engine and discovered a new perspective on the App Engine. Among Go’s traits is the ability to start an application virtually instantly. This makes App Engine and Go a perfect combination, like Batman and Robin. Together, they embody everything I personally expected from the Cloud ever since I learned about it. It truly scales to the workload and does so effortlessly. Not even the abysmal hardware options seemed to pose a real problem for Go since it is that efficient.

Data

When App Engine launched, the only database options you had were Google Datastore for structured data and Google Blobstore for binary data. Since then, they have added Google Cloud SQL (managed MySQL) and Google Cloud Storage (like Amazon’s S3) which replaced the Blobstore. From the beginning App Engine offered a managed Memcache, as well.

It used to be very difficult to connect to a third-party database since you could only use HTTP for communication. But usually databases require raw TCP. This has only changed a few years ago when the Socket API was released. But it isstill在Beta版,这使得它有问题的选择mission-critical usage. So database-wise, there is still very much of a vendor lock-in.

Anyway, in the beginning, there was only the Datastore.

Datastore

The Datastore is a proprietary NoSQL database, fully managed by Google. It is unlike anything I had ever used before. It is a massively scaling beast with very unique traits, guarantees and restrictions.

在里面early days, the Datastore was based on a master-slave setup which featured strongly consistent reads. A few years in, after it had suffered a few severe outtakes, Google介绍了一个新的配置选项:高复制。API保持不变,但写入的延迟增加,有些读物变得更加eventual一致(稍后更多)。上行的可用性显着增加。它甚至有一个99.95%的正常运行时间SLA。自从我合作以来,我从未经历过一个问题的数据存储的可用性。这只是你不必思考的事情。

Entities

The basics of the Datastore are simple. You can read and writeentities。它们是在特定的情况下分类种类。实体包括properties。属性有一个名称和一个具有某种类型的值。喜欢stringboolean漂浮orinteger。Each entity also has a uniquekey

Writing

There is no schema whatsoever, though. Entities with the same kind can look completely different. This makes development very easy: just add a new property, save it and it will be there. The flip side is that you will need to write custom migration code to rename properties. The reason for this is that an entity cannot be updated in place - it must be loaded, changed and saved again. And depending on the volume of entities, this can become a non-trivial task since you might need to use the task queue to circumvent the request time requirements. In my experience, this leads to old property names all over the place since refactoring is so costly and dangerous.

有一些与实体合作的限制。The two most critical are:

In practice, this can be an issue. We rarely hit the size limit - but when we did, it was painful. Customer data can get lost. When you hit the write rate limitation, it is usually fine on the next try. But of course you have to design your application to minimize the odds of that. For example, something like a regularly updated counter takes a lot of work to get right. Google even has a documentation entry onusing sharding to build a counter

Reading

An entity can be fetched by using its key or via a query. Reads by key are strongly consistent, meaning you will receive the latest data even if you updated the entity right before fetching it. However, this is not true for queries. They are eventually consistent. So writes are not always reflected immediately. This can lead to problems and might need to be mitigated, for example by clever data modelling (e.g. using mnemonic as key) or leveraging special Datastore features (e.g. entity groups).

A query always specifies an entity kind and optional filters and/or sort orders. Every property that is used in a filter or as a sort key must be indexed. Adding an index can only be done as part of the regular write operation. Not automatically in the background as in most SQL databases. The index will also increase the time of the write operation and the cost (more on that later).

If a query involves multiple properties, it requires a multi-index. It must be specified in a configuration file calleddatastore-indexes.xml.xml.。这是一个例子:

种类="Employee"祖先=“错误的”>name="lastName"direction="asc"/>name="hireDate"direction=“desc”/>

与其他数据库相比,缺少多索引不会导致效率低下,慢查询 - 它将立即失败。数据存储区尝试最佳强制执行性疑问。例如,不等式过滤器仅支持单个属性。当然,总有往脚射击自己 - 但它们很少见。

There are several other features I cannot go into now, for example pagination, projection queries and transactions. Go to theDatastore documentationto learn more, it is very extensive and helpful.

与其他数据库相比,读写操作非常慢。基于我的观察,键读取平均需要10-20ms。很难看到显着的偏差。我最好的猜测是Google序列化实体,并且只有索引实际上都保持在内存中。

The pricing model seems to support that: you pay for stored data, read, write and delete operations. That’s it. Note that database memory is not in that list. The operations themselves are cheap as well: reading 100k entities costs $0.06, 100k write operations cost $0.18 - a write operation can be the actual entity write but also every index write. If you don’t write anything, you don’t pay anything. But in a single minute you could be writing gigabytes of data. And here’s the kicker: The read and write performance is basically the same for a database with no entities or a billion. It scales like crazy.

API

The API to the Datatore feelsvery低级。因此,对于任何严肃的Java应用程序都没有办法Objectify。It is a library written by Jeff Schnitzer. If Google has not done so already, they should write him a huge cheque for making the App Engine a better place. He wrote it for his own business but the tireless dedication over the years, extensive documentation and support he offers in forums is astounding. With Objectify, working with the Datastore is actually fun.

Here is an example from the documentation:

@EntityClass{@Id细绳vin.;细绳Color;}ofy().save().entitynew"123123""red")).now();C=ofy().load().typeClass).id"123123").now();ofy().delete().entityC);

Objectify makes it really easy to declare entities as simple classes and then takes care of all the mapping between the Datastore.

It also has a few tricks up its sleeve. For example, it comes with a first-level cache. This means that whenever you request an entity by key, it first looks into a request-scoped cache whether the entity was already fetched. This can be beneficial for improving performance. However, it can also be confusing because when you fetch an entity and modify it butdo notsave it, the next read will yield that same cached, modified object. This can lead to Heisenbugs.

Development & Testing

Since the App Engine is a proprietary cloud database, you cannot just start it locally. When you run your application on your machine, a mock Datastore is started by the SDK. Its behavior comes very close to the production environment. Only the performance is much better, which can be misleading.

For running tests against the Datastore, the SDK is also able to start a local Datastore for you. However, this must be a different implementation since it behaves differently than the one for running the app. This becomes apparent when you realize that a missing multi-index will throw an error when executing the app locally but not when testing the same query. Over the years I accidentally released several queries with missing indexes into production (usually still behind a Beta toggle) - although I had a test for it. After contacting support they admitted the oversight and promised to fix it - more than one year later they still have not.

备份

Making backups of the Datastore is an atrocious process. There is a manual and an automatic way. Of course, when you have a production application, you’d like to have regular backups. The official way is a feature introduced in 2012 which is still in Alpha!

通过添加条目Cron.xml您可以启动备份过程。该条目将包含要备份的实体的名称以及Google云存储存储桶以保存它们。当时间来了时,它将通过备份代码启动几个Python实例,迭代数据存储,并以某种专有备份格式保存到桶中。有趣的是,桶有一个限制它可以包含的文件,因此您现在更好地使用新的桶。

This is the absolute worst thing about the Datastore.

memcache.

存储在App Engine上数据的其他关键方法是memcache。默认情况下,你得到了一个sharedmemcache.。This means, it works on a best-effort basis and there is no guarantee how much capacity it will have. There is also the dedicated Memcache for $0.06 per GB per hour.

Objectify is able to use this as a second-level cache. Just annotate an entity with@Cacheand it will ask Memcache before the Datastore and save every entity there first. This can have a tremendous effect on performance. Usually Memcache will respond within about 5 ms, which is much faster than the Datastore. I am not aware of any stale cache issue we might have had. So this works very well in production.

当memcache失败时,它的好处是非常明显的。这是在我们每年一两年的时候发生在我们身上。我们的网站勉强可用,这是慢的。

Big Query

BigQueryis a data warehouse as a service, managed by Google. You import data - which can be petabytes - and can run analyses via a custom query language.

It integrates somewhat well with the Datastore since it allows to import Datastore backup files from Google Cloud Storage. I have used this a few times, unfortunately not always successfully. For一些of our entities I received a cryptic error. I was never able to figure out what went wrong. But some entities did work. And after fiddling with the query language documentation for a bit, I was able to generate my first insights. Everything considered, it was a nice way to run simple analyses. I definitely would not have been able to do this without writing custom code. But I was not really leveraging the service’s full potential. All the queries I made could have been done in any SQL database directly, our data set was quite small. Only because of the way the Datastore worked did I have to resort to the BigQuery service in the first place.

Monitoring

The Google Cloud Console brings a lot of features to diagnose your app’s behavior in production. Just look at the Google Cloud Console navigation:

Google Cloud Console - Monitoring

This is the result ofGoogle’s acquisition of Stackdriverin 2014. It still feels like a separate, standalone service - but its integration into Google Cloud Console is improving.

让我们一个接一个地看看能力。

记录

It is crucial to access an application’s logs quickly and with ease. This is something that was truly painful on App Engine in the beginning. It used to be very cumbersome because it was incapable of searching acrossallversions of an application. This meant when you were looking for something, you had to know which version was online at the time - or try several, one by one. It was almost unusable. Plus it was extremely slow.

从那时起,它们已经添加了有用的过滤器,以仅显示特定的模块,版本,日志级别,用户代理或状态代码。它非常强大。还没有快速地,但与早期相比,它现在已经好得多了。这是它看起来的看法:

谷歌云控制台 - 日志记录

One very unique idea you can see here is that logs are always grouped by request. In all other tools I have encountered, Kibana for instance, you will only get the log lines that match your search. By always showing all other log lines around the one that matches your search, it gives you more context. I find this extremely helpful when investigating issues in the logs since it immediately helps you to betterunderstandwhat happened. I truly miss that feature in every other log viewer I use.

App Engine的另一个有趣特性是每个HTTP请求都会自动分配请求ID。它被添加到传入的HTTP请求中并唯一标识它。这可以方便地与其日志相关联。例如,我们在发生未捕获的异常并包含请求ID时发送电子邮件 - 这使得浏览日志这使得这使得这一点。前端误差跟踪可以完成相同的操作。

指标

The Cloud Console gives access to a few basic application metrics. This includes the request volume and latency, traffic volume, memory usage, number of instances and error count. It is useful as a starting point when investigating an issue and when you want to get a quick first impression of the general state of the app.

Here is an example with the app’s request volume:

谷歌云控制台 - 图表

Tracing

Since the App Engine instance is a black box, you cannot use other tools to diagnose its performance. If the logging console is not enough, theTrace页面提供了更详细的数据。它允许搜索某些请求的延迟分布。

Google Cloud Console - Trace

When you select a specific request, it opens up a timeline. There it displays the remote procedure calls (RPCs) that you cannot see in the logs. Plus, a summary for each RPC by type on the side. By clicking on an RPC, more details, e.g. the response size, are shown.

找到慢速请求的原因非常有帮助。在以下示例中,您可以看到该请求使一些快速的Memcache调用和一个非常慢的数据存储写入操作。

Google Cloud Console - Analysis

The only problem is that the RPCs do not include enough information to figure out what happened exactly. For instance, the detail view of the Datastore write operation looks like this:

Google Cloud Console - Analysis Detail View

It does not even include the name of the updated entity. This is a huge annoyance and can render this whole screen almost useless. There is just one thing which can help: clicking the ‘Show logs’ button in the upper right corner. It will include the log statements of the requestinlineinterleaved with the RPCs. This way youmight能够从上下文推断更多细节。

Resources

It is also important to point out that pricing is completely usage-based. This means the cost of your app scales virtually byte by byte, hour by hour and operation by operation. It also means, that it is very affordable to get started. There is no fixed cost. If hardly anyone uses your app - since there is a free quota - you do not pay anything.

The biggest item on the bill will most certainly be for the instances, contributing about 80% in my last project. The next big chunk is likely the Datastore read/write cost, 15% of the total cost for us.

Google Cloud Console中有一个很好的界面,以跟踪所有配额:

Google Cloud Console - Quotas

To be more specific, when I say ‘all quotas’ I mean all quotas Google tells you about. We actually had an issue where we hit aninvisiblequota. Ithinkat the time the API may have been in Beta, though. Anyway, one part of our application stopped working and we had no idea why. Luckily, we were subscribed toGoogle Cloud Support。They informed us about said quota and we had to rewrite a part of our application to make it work again.

由于令人困惑的定价设置,我们也有一个轻微的中断。在一点,我们的一个应用程序突然停止工作,刚刚回复了默认错误页面。我们花了很多分钟才能弄清楚我们达到了我们已经成立的预算限制。在我们提出它之后,一切刚刚开始工作。

Support

谷歌云支持有很多待说。首先,没有它,我们现在就会陷入严重的麻烦。所以拥有它是一个必须为任何关键任务应用程序 - 在我眼中。例如,每年一次我们的申请将停止服务请求。我们没有做出任何东西来引起这一点。联系Google支持后,我们将了解他们将应用程序移动到“不同的集群”。它刚刚努力工作。这是一个非常可怕的情况。你不能做任何事情,但'祈祷谷歌神'。

Second of all, it is a hit or miss based on the support person. The quality varied a lot. Sometimes we would need to exchange a dozen messages until they finally understood us. Like any support it can be infuriating. But in the end, they would usually resolve our issue or at least give us enough information to help us resolve it ourselves.

一个新的时代

Google is working on a new type of App Engine, the灵活的环境。It is currently in Beta. Its goal is to offer the best of two worlds: the ease and comfort of running on App Engine combined with the flexibility and power of Google Compute Engine. It allows to use any programming platform (like Java 9!) on any of the powerful Google Compute Engine machines (like 416GB RAM!) while letting Google take care of maintaining the servers and ensuring the app is running fine.

他们已经在这一定上面工作了。当然,我们热衷于尝试。迄今为止,我们并不令人兴奋。But let’s see where Google is taking this.

Design for Scale

Now, you can look at the restrictions the App Engine imposes on your app as annoyances. But bear with me for a moment. App Engine was created by Google. These guys know how to build scalable systems. The restrictions are merely a necessity. They force you to adapt your app to the ways of the Cloud. This is a good thing and should be embraced. If you feel like you are fighting the App Engine, then you are fighting against the ‘new’ rules of the Cloud. This is certainly one lesson I’m taking away from three years on Google App Engine.

Some restrictions and annoyances are the result of neglect by Google, though. It feels like they only invest the bare minimum anymore. Actually, I have had this feeling for the last two years. It is frustrating to work with an ancient tech stack, without any hope of improvement in sight. It is infuriating if there are known issues but they are not fixed. It is depressing to receive so little information on where the platform is heading. You feel trapped.

All in all, I liked how App Engine allowed the development team to focus on actually building an application, making users happy and earning money. Google took a lot of hassle out of the operations work. But the ‘old’ App Engine is on its way out. I do not think it is a good idea to start new projects on it anymore. If App Engine Flexible Environment on the other hand can actually fix its predecessor’s major issues, it might become a very intriguing platform to develop apps on.

Stephan Behnke

通过交易软件开发商。大部分时间都在持久的追求简单,优雅和守则中的代码。或者只是在介于两者之间完成的东西。

Comments powered byDisqus