Geek Igor

Notes on creating a telegram bot

I've created a telegram bot to give me a daily updates to a webpage I was tracking. This is a hobby project and the hosting price is an important concern.

The problem

I want to be notified about new real estate ads from my neighborhood.

Context

The solution

  1. Write a telegram bot to scrape the ads portal and send notifications about new ads matching the given criteria.

    • I use telegram for some group chats and it’s already present on my phone.
    • It has a nice bot API.
    • A user/bot can mark messages they send as silent so they don’t distract the recipient.
  2. Write the bot with go.

  3. Host the bot on Google App Engine (GAE) in the Google Cloud (GCP).

    • I don’t want to run the bot on my laptop, and I don’t want to maintain the server at home. We need to go to the CLOUD! 🌥
      • Of course, we can also host a VM on digital ocean (or somewhere similar) for $5 / month ;).
        • Sidenote: If you want to try digital ocean my ref link will give you $100 credits to start (over 60 days; and $25 for me).
      • I wanted to use a managed solution though, mostly to avoid maintenance work on the project as much as possible.
    • Why Google App Engine and not Heroku, or Azure/AWS equivalents?
      • Mostly to get more hands-on experience with GCP.
      • They have a generous free tier and I expect my use case to fit into the limits and cost me $0 / month (or close to it).

Why not open-source the code?

Because of the website I’m scraping. The laws on the subject are complex, scraping for personal purposes is different than showing the public how to do it. I don’t want to go through the hassle of figuring it out in the major jurisdictions. I could take out the scraping part and open-source the rest, but for now, I’m not willing to put more personal time into the project.

I hope you’ll find the notes useful regardless!

Notes on the project

Telegram bot API is really good for experimentation

The docs are really good and easy to work with. There’s no tutorial, so I recommend to use VS code + REST client extension to experiment. This is useful later on when you try to debug or understand the API a bit better. The REST client gives us a nice, interactive, almost REPL like dev-experience.

  1. Talk to BothFather on telegram to create a new bot: [docs].

  2. Start with the following requests:

@token = token-from-botfather
@root = https://api.telegram.org/bot


### Info
GET /getMe


### Get new messages sent to the bot
GET /getUpdates


### Respond back to a chat
@chat_id = numeric-chat-id
POST /sendMessage?chat_id=&text=Hello


### Check webhook status
GET /getWebhookInfo

VS code REST client session example

Anyone can chat to your bot

You may want some form of authentication before allowing telegram users to use all the bot features. In my case, I ask for a user token in order for the bot to notify you about new ads.

Google App Engine

I’ve really enjoyed working with the platform. There’s a learning curve, but there’s not a lot of complexity if you think about it as a docker container runner :)

Secrets are hard to manage

As a responsible twelve-factor app citizen you want to provide the config, including secrets, via environment variables. On GAE you put them in app.yaml file. The problem is that you can’t keep the file plaintext in your repo — everyone with access to the code will be able to access the secrets as well.

See, e.g. https://cloud.google.com/appengine/docs/standard/go/sending-messages

Add the key to the environment variables section in your app’s app.yaml

Note that anyone with access to your app.yaml will also have access to your API keys. If you can’t store app.yaml in a secure location, we recommend you store the API keys in a secure location such as Datastore and retrieve the keys at runtime, or keep the keys in your code but encrypt them with a keystore, such as Cloud Key Management Service.

This is a far cry from the ease of adding “secret” env variables in other services. E.g. adding secrets to GitHub Actions is just a few clicks in the UI.

The alternative is to use something like gitcrypt to transparently encrypt the file in your repo.

Remember, we talk about a personal time hobby project, not an enterprise deployment where secret management is likely a solved problem.

If you don’t know GAE start with the concepts

See https://cloud.google.com/appengine/docs/standard/go/concepts, especially the How Instances are Managed section.

Run it for cheap :)

Note when I comment on pricing in this article it is based on my understanding of the GCP price model. Cloud pricing models are notoriously complex — do your own research, utilize the GCP trial to figure out the costs of your project. Don’t blindly follow the advice of some random-person-on-the-internet.

At the time of writing GAE free tier looks like this [src]:

- 28 hours per day of "F" instances
- 9 hours per day of "B" instances
- 1 GB of egress per day
- The Free Tier is available only for the Standard Environment

Put something like this in the app.yaml config file to stay within the limits:

instance_class: F1
automatic_scaling:
  max_instances: 1

Old versions of the app may be still running

When you do gcloud app deploy GCP build and deploys new version of your app. The problem is that the old versions may be still running. Unless you do traffic splitting they probably get 0 traffic, but they may serve existing connections, or they may do some operations in the background. Keep that in mind, delete the old versions when you don’t need them (and avoid async operations, GAE is not designed for that, see Go concurrency patterns are not that useful section).

Old versions still running

The app is not guaranteed to scale up from 0 to 1 after deployment

No instances running

Don’t forget to clean up the lingering images!

[docs]

Each time you deploy a new version, a container image is created using the Cloud Build service. That container image then runs in the App Engine standard environment. (…) Once deployment is complete, App Engine no longer needs the container images. Note that they are not automatically deleted, so to avoid reaching your storage quota, you can safely delete any images you don’t need.

The images are stored in a Google Cloud Storage bucket. Actually, I’ve noticed that GCP created two new buckets, one of them had a 15d lifecycle policy attached, but the other not.

The built-in observability tools are really good (at least for a personal project)

Structuring the app

Go concurrency patterns are not that useful

Due to autoscaling our app may be scaled-down anytime when it is not servicing requests. This means that we can’t rely on goroutines (and channels) to complete operations in the background.

A GAE app should be based on the request-response model. It gets a request, does the processing, and responds to the request.

On the other hand, we prefer to respond to customer requests right away and queue the heavy lifting for async processing (see Queue everything and delight everyone).

GCP provides us with primitives to compensate the lack of async processing. We can use task queues and cron jobs.

Protecting the internal endpoints

If we use task queues and cron jobs we end up with a few “internal” endpoints. We may not want to let them be called externally.

There’s an easy way to only allow internal traffic from tasks or jobs:

GAE guarantees that such headers will be deleted from external requests. The list of the headers removed/overwrote by GCP is [here].

Use firestore to store the state of the app

If there’s a state you want to preserve between restarts of your app (e.g. a list of telegram chats to send the new ads to, a list of already processed ads) you need a database. The disk is out of the question because you don’t get persistent storage in GAE. Firestore is the only DB available in the GCP free tier.

Firestore takes a bit of getting used to. I suggest watching a couple of Get to know Cloud Firestore videos. They are excellent — short and to the point.

Firestore offers an emulator which you can use locally for integration tests:

Evaluate FaaS as an alternative

If you outsource the state to the database (you have to because the app may be scaled down to 0 nodes), use queues/cron (the app may be scaled down to 0), avoid initialization code (the app is not guaranteed to run after the deployment) then your app becomes a collection of HTTP request handlers interacting with the DB.

At this point, you may consider using a Function-as-a-Service platform (e.g. GCP Cloud Functions, AWS Lambda).

Summary

A telegram bot is an excellent option as the UI facade for your personal project. You can run it for cheap in Google App Engine. After the initial learning curve, the platform is pleasant to work with and requires little maintenance. I hope this post will give you an idea of the challenges involved in developing for GAE.