Sep
30

Duplicate Celery logs in a Flask app

posted on 30 September 2020 in programming

Celery and Flask go together like tacos and Tuesdays, so I was surprised to run into some incompatibility in the way they setup and use loggers which caused Celery to dump duplicate logs. Perhaps it was just the way I had structured my project, using a Flask application factory and Celery tasks in a tasks folder, but that seems pretty standard. Or maybe it’s how I’m setting the log level? But I digress.

The problem is that Celery is producing duplicate logs, like this:

[2020-09-30 07:48:54,638: WARNING/ForkPoolWorker-3] [2020-09-30 07:48:54,637] INFO in cleaner: Cleaning up 0 expired alerts
[2020-09-30 07:48:54,637: INFO/ForkPoolWorker-3] Cleaning up 0 expired alerts
[2020-09-30 07:48:57,050: WARNING/ForkPoolWorker-5] [2020-09-30 07:48:57,049] DEBUG in alerts: Evaluating 2 alerts
[2020-09-30 07:48:57,049: DEBUG/ForkPoolWorker-5] Evaluating 2 alerts

After stepping through the code in Celery’s app/log.py and Flask’s logging.py it turns out that Celery is trying to reconfigure the logger hierarchy, including adding a catch all root logger, but the standard python logging library updates the heirarchy whenever a new logger is added, based on the logger name (e.g. myapp.tasks.alerts), and Flask lazy loads it’s logger so it clobbers some of the Celery setup.

Let’s cover that in more detail.

Within my Celery task module, I get the logger as a global variable.

# @myapp/tasks/alerts.py
from celery.utils.log import get_task_logger

logger = get_task_logger(__name__)

Celery’s get_task_logger() function essentially wraps the standard logging.getLogger() call and then changes the parent to be the “celery.task” logger (source).

# @celery/utils/log.py
def get_task_logger(name):
    """Get logger for task module by name."""
    if name in RESERVED_LOGGER_NAMES:
        raise RuntimeError(f'Logger name {name!r} is reserved!')
    return _using_logger_parent(task_logger, get_logger(name))

Celery gives the new logger a logging.NullHandler, so anything written to this loger will propagate up to the parent logger (“celery.task”) to be handled. This logger uses a custom Celery TaskFormatter which print out helpful information about the task name and id that it was running in. This is what we want.

[2020-09-30 18:53:48,142: DEBUG/ForkPoolWorker-3] myapp.tasks.alerts.evaluate_alerts[4ea6cd37-932f-4286-89f2-1aa4ba5ee2ed]: Evaluating 2 alerts

So if nothing else happened, our logger would be wired up to Celery’s “celery.task” logger and it would print once with the task id. Unfortunately this wasn’t happening for me, and the reason was because in my Flask create_app() function, I was setting the logging level.

def create_app(test_config: Optional[Dict[str, Any]] = None) -> Flask:
    ...
    app.logger.setLevel(app.config.get("LOG_LEVEL", logging.WARNING))

Why would this matter? Because app.logger is a lazy loaded property, meaning now that I access it, it initialises the Flask logger, using the name of our project.

# @flask/app.py
@locked_cached_property
def logger(self):
    return create_logger(self)

# @flask/logging.py
def create_logger(app):
    logger = logging.getLogger(app.name)

    if app.debug and not logger.level:
        logger.setLevel(logging.DEBUG)

    if not has_level_handler(logger):
        logger.addHandler(default_handler)

    return logger

Now, inside python’s logging.getLogger, it updates the loggers parent, and tries to update any existing loggers that should now be the children of this new logger based on the naming convention.

# @logging/__init__.py
def getLogger(self, name):
    ...
    self._fixupChildren(ph, rv)
    self._fixupParents(rv)

So, when we create a new flask logger called "myapp", our task logger "myapp.tasks.alerts", which used to have "celery.task" as it’s parent, now gets updated to have our new flask logger as it’s parent, so we lose the task id and name. But why do we get 2 logs? Well, Celery helpfully created a root logger as a catch all, and by default loggers propagate their log events to the parents as well, so both our Flask app logger, and the root logger see the event and write it. This doesn’t happen with the “celery.task” logger because it’s configured not to propagate events. Here’s a tree view to visualise it.

What we want

myapp.tasks.alerts(NullHandler, propagate=True)
└─── celery.task(StreamHandler, propagate=False) *Writes log and stops
    └─── celery(StreamHandler, propagate=False)
        └─── root(StreamHandler, propagate=False)

What we ended up with

myapp.tasks.alerts(NullHandler, propagate=True)
└─── myapp(StreamHandler, propagate=True) *Writes log and propagates
    └─── root(StreamHandler, propagate=False) *Writes log

So, what’s the solution? Well I messed around with trying to reset my task loggers back to have “celery.task” as a parent, but it was overridden again. The simple solution I ended up with was just giving my task loggers a name that wasn’t part of the heirarchy.

logger = get_task_logger("alerts")

Yeah, that simple.