México

Chile

Engineering

Deploy Station: How We Reached 162 Deploys Per Month

Nicolás Teare

June 4, 2024

de lectura

At Fintoc, we focus on building an engineering culture where moving fast is part of our daily routine. Here, I’ll walk you through the automations and bots we built to make reviewing PRs and deploying as seamless and fast as possible at Fintoc.

Tabla de Contenidos

Link

Compártelo

In the early days of computing, machines could only run one program at a time, making software development a slow process. Users had to store their programs and data on punch cards or magnetic tapes, load the program, execute it, debug it, and finally collect the results. Developing and testing software under these conditions was tedious and complex.

Over time, batch processing systems were developed, allowing multiple programs to be written on the same magnetic tape. The computer would read the tape and execute one instruction after another, improving efficiency but still taking hours or even days to get a response.

Today, computers can run millions of programs in parallel, drastically speeding up software development and testing. This has enabled much faster and more efficient development cycles.

At Fintoc, we believe that product development should be as fast as technology allows.

Any friction that prevents you from developing or shipping to production is a massive problem to solve. That’s why we’ve invested heavily in improving our engineering team's development experience. Every pull request that’s ready but not deployed to production is time we’re not generating value for our users.

We deploy to production between 6 and 9 times per day. We like to move fast without breaking things, making the process as easy as possible. Achieving this deployment speed required several improvements. Here’s how we built the automations and bots that make PR reviews and deployments fast and smooth.

Step 1: Making PR reviews easier

The first step to fast deploys is making the PR review process seamless.

To achieve this, every time you submit a PR to GitHub, a GitHub Action automatically assigns a size label to it. The labels range from size/XS to size/XXL. If a PR is size/XL or size/XXL, reviewers can request (without reviewing it) that you split it into smaller PRs for faster and easier reviews.

The smaller the PR, the easier it is to understand and review each line of code. If the PR is too large, the reviewer needs more time to go through it, and the owner takes longer to address feedback. Additionally, smaller PRs make it easier to catch bugs and perform reverts.

Once you submit your PR, a bot automatically assigns two reviewers, so the owner doesn’t have to waste time deciding who should review it.

More PRs = More merges

With the team submitting smaller PRs, we ended up with 20 to 30 PRs per day. This introduced a new problem: merging them was becoming slower than it should be.

We use Continuous Integration, running all Fintoc tests, building the production image, running migrations, and more. This multi-step process can take a long time, and if someone merges before you, your PR must be rebased and re-run through the entire CI process again due to Semantic Conflicts in Git branches.

To solve this, we implemented a merge queue—a queue that automatically merges PRs. We named it Fintoneta (inspired by Scaloneta, the nickname for Argentina’s World Cup-winning team). To merge a PR, all we have to do is comment fin merge, and the bot takes care of updating the PR, running CI, and merging it into the main branch if tests pass. If conflicts arise, it notifies the developer to resolve them.

*How our Fintoneta merge queue works is a topic for another post that we’ll publish later.*

‍Another problem, another bot

We use Linear issues to organize our workflow, but manually moving tasks when a PR is merged or deployed to production is a hassle.

That’s why, over two years ago, we started adding the Linear task ID (a tag like [INF-972]) in PR titles. A bot associates the PR with the task. As the PR progresses through in-progress, in-review, staging, and production, the bot moves the task accordingly. (Not a bad idea—Linear later developed this feature and made it part of their app).

It worked well, but there were still two issues to improve:

It notified the entire engineering team and asked them not to deploy to production, but it didn’t actually block the deploy pipeline. If a dev missed the message, they could still deploy, making the warning useless.

You could forget to "release" staging, which affected other devs' work. As our team grew and we started pushing more PRs, deploying and testing in staging became a hassle. So, we decided to build our developer experience bot, Gilfoyle.

Gilfoyle is born

We created Gilfoyle, a bot named after one of our favorite characters from Silicon Valley (highly recommended if you don’t take it too seriously).

It handles everything from PR classification and deployment automation to making sure staging environments aren’t blocked for too long.

Enter Gilfoyle: Our dev experience bot

We created Gilfoyle, named after our favorite Silicon Valley character. Gilfoyle handles various tasks to streamline the deployment process and improve PR visibility.

PR classifier

Every time a pull request is created, in addition to adding the size label, Gilfoyle checks which files are affected and assigns labels to provide more context.

The labels we created are:

migration: This PR involves changes to the database model.
internal-api: Changes to serializers between Fintoc’s internal services.
public-api: Changes to the client-facing API.
infra: Changes to infrastructure configuration files like Terraform, Kubernetes, or others.

Additionally, we made it generic so that as we grow or see the need, adding new labels is easy. All it takes is adding a new regex rule to the classifier.

Deploy from anywhere

Of course, Gilfoyle allows you to deploy from Slack. To encourage deployments and make deploying at Fintoc as easy as possible from day one, we created deploy <repo>, a command that tells you which PRs are ready to be deployed in the selected repository and lets you deploy through a super-fast interface. It’s so seamless that you can even deploy from your phone.

@gilfoyle deploy `repo`

Gilfoyle always notifies when a deploy is made to staging or production. Every time a PR is merged and its deployment is completed, a notification is sent to a channel. This notification first indicates if a deployment is in progress and later updates the message to reflect the final status. When a deploy to staging is successful, it looks like this:

Some additional features we added to the bot:

It notifies when a deploy fails, including a link to the PR.
It uses PR labels to display an emoji indicating the nature of the PR, making it easy to identify critical PRs (like the one above 🐘, which has a database migration).
It alerts the PR owners whether the deploy was successful or not.

Encouraging deployments

Gilfoyle also monitors the number of PRs accumulated in staging, and if there are more than 4 pending, it triggers an alert prompting you to deploy.

If you're constantly pushing large amounts of code to production, you run the risk of things breaking. The more small changes you make, the easier it is to identify what went wrong and fix it quickly.

For example, if you deploy 10 PRs at once, any one of them could fail, and you'd have to check all of them to find out what went wrong. On the other hand, if you deploy every 1 or 2 PRs, many of those risks are mitigated, and you have more control to monitor that everything is working fine in production. Plus, it makes rollbacks much easier to execute.

@gilfoyle block `repo`

At Fintoc, if your code is in staging, it’s deployable to production. Anyone can do it, and it’s your responsibility to ensure it doesn’t get deployed if you’re not ready.

Gilfoyle allows you to block and unblock a repository if you're testing sensitive changes in staging, preventing anyone from deploying your code to production without your permission. If you need to test something critical in staging, you can use block deploy <repo>, and Gilfoyle will notify everyone that you're using that repository. It also disables the deploy station, so no one can deploy until you unblock it. (You can also configure this manually from the PR by adding the block-deploy label.)

However, it periodically reminds you that you have it blocked (so it doesn’t stay locked for too long) and also notifies you if someone else is trying to deploy to production, so you can hurry up. 😉

It also marks it with a ✅ if it has been unblocked, providing more context for anyone reading the message.

Should I deploy today?

Fintoc's engineering culture is all about moving fast and doing things right.

We love these tools because, besides giving visibility to both your work and others', they reflect this mindset. Our goal is for a dev to be able to deploy as quickly as possible from day one at Fintoc. In fact, Felipe (the latest dev to join Fintoc) deployed to production 5 times and merged 11 PRs in his first 2 weeks.

With these and similar tools, we’ve reached an average of 9 deploys per day and 20 to 30 PRs deployed daily, with just 14 devs on the team.

And if anyone ever feels hesitant or needs a little motivation, I made this:

🔗 https://shouldideploy.fintoc.com 🚀

Based on https://shouldideploy.today.

Growing the team doesn’t necessarily mean slowing down, and that’s something we always keep in mind. Expanding the team should mean moving faster, which pushes us to continuously invest in improving our engineering processes and tools.

‍

Escrito por