You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
barjin opened this issue
Apr 14, 2025
· 0 comments
Labels
featureIssues that represent new features or improvements to existing features.t-toolingIssues with this label are in the ownership of the tooling team.
The interleaving of async/await model and the stream interfaces simply feels unergonomic.
This is mostly because **while kvs.setValue supports streams, it accepts a stream instance as a parameter ** (instead of implementing a Writable that could be piped to or used as a pipeline step).
KVS.getValue currently doesn't seem to support streams at all.
Motivation
Simplify work with large files in the Crawlee / Apify ecosystem.
Ideal solution or implementation, and any additional constraints
This might require adding the same methods into the client(?) and / or SDK.
Alternative solutions or implementations
For pluggable stream-enabled setValue, see examples above.
For stream-enabled getValue (on Apify), users afaik have to use the API directly using an HTTP client.
I don't currently know any hack for streaming the FS-backed memory-storage KVS records.
Other context
No response
The text was updated successfully, but these errors were encountered:
barjin
added
the
feature
Issues that represent new features or improvements to existing features.
label
Apr 14, 2025
featureIssues that represent new features or improvements to existing features.t-toolingIssues with this label are in the ownership of the tooling team.
Which package is the feature request for? If unsure which one to select, leave blank
None
Feature
While the current KVS implementation can work with Node.JS streams (e.g. in the
setValue
method), the interface leaves a lot to be desired.See e.g. this
FileDownload
example - without KVS, thestreamHandler
can be neatly implemented with onepipeline
:If we want to switch filesystem writes for KVS, though, the API gets in the way:
The interleaving of async/await model and the stream interfaces simply feels unergonomic.
This is mostly because **while
kvs.setValue
supports streams, it accepts a stream instance as a parameter ** (instead of implementing aWritable
that could bepipe
d to or used as apipeline
step).KVS.getValue
currently doesn't seem to support streams at all.Motivation
Simplify work with large files in the Crawlee / Apify ecosystem.
Ideal solution or implementation, and any additional constraints
Not sure. Perhaps adding methods like
and
would be sufficient.
This might require adding the same methods into the client(?) and / or SDK.
Alternative solutions or implementations
For pluggable stream-enabled
setValue
, see examples above.For stream-enabled
getValue
(on Apify), users afaik have to use the API directly using an HTTP client.I don't currently know any hack for streaming the FS-backed
memory-storage
KVS records.Other context
No response
The text was updated successfully, but these errors were encountered: