RBL-Checker: details on the refactorization

A long time ago, I wrote RBL-Checker, a tool to check if a range IP is blacklisted or not.

I continue to use it but need a lot more complete and that integrate better in my use case.

In this post, you’ll learn:

  • my use case (actual and future)
  • my choice of style for writing code and tools

My use case

I wrote the first version at home when I worked for a hosting provider. This company never sees the ROI to build this style of little software.

I can’t tell you that I liked this manager when I needed to debug one email server with a very high queue with nothing about metrology, detection of a blacklist, exclusively reading the log without an ELK or similar.

Right now, I continue to have my email servers so I use my software but also for any clients that need to monitor and alert on them.

My future use case can be for a non-profit IAP because it’s very important to keep range IP clean and be alerted when an IP becomes blacklisted.

My choice

My first version was in Python and I continue in that way.

From monolith to 2 services

I moved from a service that uses subprocess to do check-in concurrency to 2 services because I need to check very quickly a large range of IP.

Consumer

Asynchronous service taking a message from the queue and checking IP across the list.
Finally, add blacklisted IP with the blacklist name and DateTime of the check to the database.

Rest API

I use it exclusively to add range to publish the message to the queue.
Maybe later, I’ll add an endpoint to know how many messages are in the queue.

The biggest move can be a migration to Haskell but right now, I don’t need it.

Possible big change

Migrating from JSON to binary format and in that case, I see these possibilities:

The stack

AIOPG

AIOPG is asynchronous compared to psycopg2.

I use PostgreSQL and Yugabyte, both can be used with PostgreSQL’s drivers so it was my choice to stay with classic database that can be pushed in production very easily and won’t be a pain.

Also, public cloud providers provide managed PostgreSQL (like Amazon RDS for PostgreSQL ) so it’s easy to deploy.

FastAPI

One year ago, I left Flask for FastAPI.

It permits to build async rest api and provides OpenAPI with Swagger UI and ReDoc by default.

Deploying Python-based software it’s very easy and compatible with every CPU architecture (ARM, RISC-V, x86).

Kafka

I have one Kafka cluster running so I continue to use my actual stack and I won’t add a new service (NATS & co) for this little job and nothing more.

I activated compression on the producer (lz4) to gain disk space.

Some cloud providers provide Kafka or Kafka compatible managed service so it’s very easy to deploy and you don’t need to manage it.