-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Reduce memory consumption of graph (pipe sets) #4498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferencesFootnotes
|
42280e9
to
38ca388
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is some substantial memory saving which is great. I'll need to test this out a bit because I'm worried about nil pointer exceptions.
It's a shame this change requires so many sweeping changes but I can't think of a way to not require sweeping changes of some form. I did some investigating into alternative approaches e.g.:
- using
[20]byte
instead ofstring
(much better than string, but doesn't address duplicates) - Having an immutable commit struct which itself lives in a pool. This would represent the actual commit and nothing independent of it like tags. Then each commit in the state can have a pointer to the corresponding immutable commit. You could have the immutable commit have its parents as just pointers to other immutable commits.
- Storing a reference to the pool inside the commit and using
.Hash()
to encapsulate how to resolve the actual string. Unfortunately means bloating each commit with an 8 byte pointer.
Did you have any thoughts on the above?
I did consider making the Hash property of Commit private (lowercase) and exposing a getter. When doing this in an extra commit, the actual change of turning it into a pointer would require much fewer changes. I didn't do that just because currently all fields are public, but I'm totally open to changing this either just for this field, or for all.
I don't see how this addresses the problem of the duplicate hash strings; unless you change the Pipes to have pointers to commits instead of pointers to hashes, is that what you meant? But then we need a map of commits by hash, so that we can look up the commit if we only have the hash. I'm not sure I like this direction.
I'm not sure what you mean by "resolve the actual string", and why you need a reference to the pool for that. The |
I meant that the
Yep that's what I meant: a map of hash to commit. I like the idea because it would allow us to do some more stuff in memory without having to shell out to git but it's a big piece of work and I'm not clear on the overall value. |
Still don't get it. We'd have a |
When I said above "The .Hash() getter would just return the pointer", I meant the dereferenced pointer of course. |
Ah yes, I mistakenly thought you'd need to access the pool to deference. Well in that case I actually very much like the idea of having a |
These are not the expected commits.
The "// merge commit" comment was plain wrong, this is any commit that has a parent, merge or not. The "else if" condition was unnecessary, a plain "else" would have been enough. But the code in the two blocks was almost identical, so extract the one thing that was different and unify it. And while we're at it, use IsFirstCommit() instead of counting parents.
We want to unexport Parents in a later commit.
This is in preparation for turning the hash into pointer to a string.
This in itself is not an improvement, because hashes are unique (they are shared between real commits and rebase todos, but there are so few of those that it doesn't matter). However, it becomes an improvement once we also store parent hashes in the same pool; but the real motivation for this change is to also reuse the hash pointers in Pipe objects later in the branch. This will be a big win because in a merge-heavy git repo there are many more Pipe instances than commits.
This is exactly the same as what we did for Hash earlier. And for the same reason: we want to turn the parents field into a slice of pointers.
We need to pass %P instead of %p in the format string of the git log command, so that the parent hashes have the full length and can be shared with the real hashes.
Change the base type of some of our enums from int to uint8, and reorder fields for better packing. This reduces the size of models.Commit from 152 to 132 bytes on my machine. This doesn't improve overall memory usage significantly, but why not save a little bit of memory if it's easy.
Now that commit hashes are stored in a pool and referenced by pointer by the commits, we can use those same pointers in the pipes.
Now that all hashes that we deal with are stored in the same pool, we can simply compare their addresses.
This saves some memory at the cost of a slight performance increase (I suppose reallocting the slice when adding new Pipes is slightly more expensive now). Performance of the BenchmarkRenderCommitGraph benchmark is 130μs before, 175μs after. I'm guessing this is still acceptable.
The instances are held by the AuthorStyle cache.
…king Hopefully, graphs will never get wider than 32768 characters. (They would get kind of hard to navigate if they did...) This reduces the size of the Pipe struct from 48 to 32 bytes, which makes a significant difference when there are many millions of instances.
38ca388
to
ff3d700
Compare
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferencesFootnotes
|
@jesseduffield I reworked the branch completely, please have another look. I'm quite happy with this now. |
As a followup to #2533, reduce the memory consumption some more by optimizing the storage of the pipe sets used for the commit graph.
Some coarse measurements (taken by opening the respective repo, typing
4 >
, and waiting for all commits to be loaded, and then reading the Memory column in Activity Monitor):It's still not really usable for the linux kernel, but for all other repos that I come across in my daily use of lazygit, it works quite well now.