Skip to content

asgiref.local.Local creates reference cycles that require garbage collection to be freed when executed in a sync context #487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
patrys opened this issue Jan 24, 2025 · 8 comments · May be fixed by #508

Comments

@patrys
Copy link

patrys commented Jan 24, 2025

We use Django, which uses asgiref.local.Local for its connection management. While debugging memory buildup, I've noticed that thread-critical Locals create reference cycles when used in a sync context.

Steps to reproduce

  1. Disable garbage collection
  2. Set garbage collectors debug mode to DEBUG_LEAK
  3. Create a Local variable in synchronous context
  4. Try to read an inexistent attribute from said Local
  5. Force garbage collection and look at its output
import gc
from asgiref.local import Local

l = Local(thread_critical=True)
gc.collect()  # make sure there is no lingering garbage
gc.disable()
gc.set_debug(gc.DEBUG_LEAK)
try:
    getattr(l, "missing")
except AttributeError:
    pass
gc.collect()
gc.set_debug(0)

Explanation

When Django tries to reference a database connection that is not yet established, it executes something like this (paraphrasing):

from asgiref.local import Local

connections = Local()

def get_connection(alias: str):
    try:
        return getattr(connections, alias)
    except AttributeError:
        conn = create_new_connection(alias)
        setattr(connections, alias, conn)
        return conn

Now, internally, asgiref's Local does this:

def __getattr__(self, key):
    with self._lock_storage() as storage:
        return getattr(storage, key)

@contextlib.contextmanager
def _lock_storage(self):
    if self._thread_critical:
        try:
            asyncio.get_running_loop()
        except RuntimeError:
            yield self._storage
        else:
            ...
    else:
        ...

Now, putting everything together:

  1. Django calls getattr on the Local object
  2. The _lock_storage context manager is entered
  3. It attempts to find the asyncio.get_running_loop(), which raises a RuntimeError
  4. The exception handler yields self._storage (at this point, we're still in the exception handler inside the context manager)
  5. Local executes getattr on storage, which raises an AttributeError
  6. The AttributeError is propagated back to the context manager and since it's in the exception handler, it's linked to the previous RuntimeError (Python assumes the AttributeError was raised while attempting to handle the RuntimeError)
  7. At this point, both exceptions hold each other referenced (one through exception chaining, the other through the traceback)
  8. They also hold everything up to the point in my code where I attempted to use the database connection referenced, preventing those objects from being freed as well

Potential solution

Changing the _lock_storage implementation to the following fixes the issue:

@contextlib.contextmanager
def _lock_storage(self):
    if self._thread_critical:
        is_async = True
        try:
            asyncio.get_running_loop()
        except RuntimeError:
            is_async = False
        if not is_async:
            yield self._storage
        else:
            ...  # remaining code unchanged
    else:
        ...
@andrewgodwin
Copy link
Member

Which version of asgiref is this against? There is a potential fix for this in main that has not yet been in a public release.

@patrys
Copy link
Author

patrys commented Jan 24, 2025

@andrewgodwin I noticed this in asgiref 3.8.1, but I checked that this code path was the same on main.

@SebCorbin

This comment has been minimized.

@SebCorbin

This comment has been minimized.

@carltongibson
Copy link
Member

@patrys Thanks for the report. I've finally got some space in the upcoming window to sit down with asgiref. Can I ask...

Disable garbage collection... in the exception handler inside the context manager ... At this point, both exceptions hold each other referenced.

The key bit in the flow is the first one there right? (Or not?) — "Disable garbage collection". Python's GC can be a little slow shall we say, but it does work in the end no? Just trying to understand to effect here. (Independently of what to conclude)

@SebCorbin: Please do open a fresh one. There are a few related issues around 3.8 that need investigation. I'd rather have a duplicate than it slip through the gaps. (If you could minimise your example, somehow, that might be handy!) Thanks.

@patrys
Copy link
Author

patrys commented Apr 10, 2025

@carltongibson The gc does work eventually. I recommend disabling gc as a step to make sure there is no race condition during reproduction. We keep it enabled in production, but we also monkey-patch asgiref as the memory buildup prior to the next garbage collection cycle can get the entire process terminated (by the kernel OOM killer), and the collection process itself is very slow (and synchronous) with a lot of interlinked objects to visit.

@carltongibson
Copy link
Member

@patrys Makes sense. Thanks.

... we also monkey-patch...

I see that here: saleor/saleor#17594

Would you like to open a PR here to upstream that?

@fowczarek
Copy link

@carltongibson, I will handle it.

fowczarek added a commit to fowczarek/asgiref that referenced this issue Apr 11, 2025
@fowczarek fowczarek linked a pull request Apr 11, 2025 that will close this issue
fowczarek added a commit to fowczarek/asgiref that referenced this issue Apr 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants