Skip to content

How are community reports structured across hierarchy levels? #1850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bykimby opened this issue Mar 30, 2025 · 5 comments
Closed

How are community reports structured across hierarchy levels? #1850

bykimby opened this issue Mar 30, 2025 · 5 comments
Labels
autoresolved awaiting_response Maintainers or community have suggested solutions or requested info, awaiting filer response stale Used by auto-resolve bot to flag inactive issues

Comments

@bykimby
Copy link

bykimby commented Mar 30, 2025

I was going through the code and had a question about the "From Local to Global" paper. From my understanding, the paper generates community reports at both the leaf level and higher (intermediate) levels.

For leaf-level communities, the report includes all the nodes and edges. At the intermediate levels, it includes all the nodes and edges from its sub-communities. However, if this information exceeds the context window, the system uses a summarized version (i.e., the community report of the lower-level community) instead.

Is this understanding correct? If so, could you tell me which file and prompt implement this behavior? And if the implementation has changed from the original paper, could you let me know how it works now?

@bykimby
Copy link
Author

bykimby commented Mar 30, 2025

Also, when generating community reports at the leaf level, the system includes the entity and relation IDs that each finding is based on, which allows for traceability.

However, if the context window is exceeded and the entity and relation information is replaced with a summary at the intermediate level (i.e., the community report), does that mean this traceability is lost?

@bykimby
Copy link
Author

bykimby commented Mar 31, 2025

When generating a summary based on leaf-level community information, how were the edges selected?(Could you please answer this question based on your paper?)

  1. Did you include only the nodes and edges that both belong to the community?
  • That is, were edges included only if both nodes are within the community?
  1. Or did you include only the nodes that belong to the community, but all edges connected to those nodes, even if they are connected to nodes outside the community?
  • That is, were external connections also included?

@natoverse
Copy link
Collaborator

This is correct. The community context building happens here: https://github.com/microsoft/graphrag/blob/main/graphrag/index/operations/summarize_communities/build_mixed_context.py

Note that sufficiently large contexts will end up truncated even with sub-summarization.

For community building we only include edges that reside entirely within the community. We include all nodes. This happens here: https://github.com/microsoft/graphrag/blob/main/graphrag/index/workflows/create_communities.py

@natoverse natoverse added the awaiting_response Maintainers or community have suggested solutions or requested info, awaiting filer response label Apr 8, 2025
Copy link

This issue has been marked stale due to inactivity after repo maintainer or community member responses that request more information or suggest a solution. It will be closed after five additional days.

@github-actions github-actions bot added the stale Used by auto-resolve bot to flag inactive issues label Apr 16, 2025
Copy link

This issue has been closed after being marked as stale for five days. Please reopen if needed.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autoresolved awaiting_response Maintainers or community have suggested solutions or requested info, awaiting filer response stale Used by auto-resolve bot to flag inactive issues
Projects
None yet
Development

No branches or pull requests

2 participants