Skip to content

[Fatal Bug]: Nodes Missing Community and Level Attributes Are Excluded During Query Execution #1808

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
IT-Bill opened this issue Mar 14, 2025 · 0 comments
Open
3 tasks done
Labels
bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer

Comments

@IT-Bill
Copy link

IT-Bill commented Mar 14, 2025

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

Image
In V1.0.0, after running the Leiden algorithm in layout_graph, some nodes were not assigned a community and level. To handle this, the code assigned pseudo community and level as a fallback mechanism. While this approach was not ideal, it did not introduce a fatal bug, as the missing nodes were still retained.

Image
In V2.0.0, the handling of nodes changed:

  • Nodes are no longer saved persistently but are instead generated dynamically during queries.
  • The previous fallback mechanism (assigning pseudo community and level) was removed, meaning that nodes without a level attribute (level=None) are left unprocessed.
  • This causes an issue in _filter_under_community_level: nodes with level=None are wrongly discarded, leading to an approximately 10% data loss in the graph.

To confirm the issue, I saved nodes_df before and after _filter_under_community_level and observed that these unassigned nodes were filtered out incorrectly.

This results in significant data loss and affects query results, making it a critical bug that needs to be addressed.

Workaround

To mitigate the issue temporarily, you can add the following code snippet before the filtering step. This workaround assigns default values to missing level and community attributes, ensuring that nodes are not inadvertently discarded:

nodes_df["level"] = nodes_df["level"].fillna(0)
nodes_df["level"] = nodes_df["level"].astype(int)
nodes_df["community"] = nodes_df["community"].fillna(-1)
nodes_df["community"] = nodes_df["community"].astype(int)

Screenshot

Steps to reproduce

  1. Create or load a graph that includes isolated nodes (nodes not connected to any other nodes).
  2. Run Index
  3. Execute local search
  4. Observe that nodes with a missing level attribute (level=None) are discarded during the filtering process, resulting in approximately 10% of nodes being lost.

Image

Expected Behavior

  • All nodes, including isolated ones, should either be assigned a valid community and level or be handled in a way that retains them in the dataset.
  • The filtering function should not discard nodes with a missing level attribute unless explicitly intended, thereby preventing unintended data loss.

GraphRAG Config Used

No response

Logs and screenshots

No response

Additional Information

@IT-Bill IT-Bill added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer
Projects
None yet
Development

No branches or pull requests

1 participant