Skip to content

Removed orphan assets #3567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 22, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed docs/book/.gitbook/assets/01_local_stack.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/01_personal_settings.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/02_multiple_stacks.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/02_project_settings.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/03_invite_token.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/03_multiple_users.png
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/04_register_stack.gif
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/04_sign_up.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/Dashboard.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/Login_to_dashboard.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/NeptuneUI.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/PipelineVersion.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/RemoteServer.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/Remote_with_git_ops.png
Binary file not shown.
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/Scenario3.1.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/Scenario3.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/SimplePipelineDag.png
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/admin_role.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/alerter.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/annotator.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/architecture_diagram.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/artifact-store.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/artifact_exchange.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/artifact_store_deploy.png
Diff not rendered.
Diff not rendered.
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/assign_permissions.png
Diff not rendered.
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/cached.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/cached_run_dashboard.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/ci-cd-local.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/ci-cd-prod.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/ci-cd-staging.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/cloud-user-invite-flow.png
Diff not rendered.
Diff not rendered.
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/component-guide.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/container-registry.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/create_role_modal.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/ct_cd_zenml.gif
Diff not rendered.
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/customer_satisfaction.jpg
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/dag-visualizer.png
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/data-scientist.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/data-validator.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/docs-intro.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/drift-visualization.png
Diff not rendered.
Diff not rendered.
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/examples.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/expectation-suite (1).png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/experiment-tracker.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/extensibility.gif
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/facets-visualization.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/failure_alerter.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/faq.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/feature-store.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/features.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/finetune_zenml_home.png
Diff not rendered.
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/header.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/image-builder.png
Diff not rendered.
58 changes: 0 additions & 58 deletions docs/book/.gitbook/assets/interrogate.svg
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/intro-zenml-overview.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/intro_dashboard.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/intro_zenml_overview.png
Diff not rendered.
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/llms-txt-thumb.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/localstack (1).png
Diff not rendered.
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/localstack.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/login_dashboard.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/mcp_model_register.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/mcp_pipeline_overview.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/meet_the_team.png
Diff not rendered.
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/ml-engineer.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/ml-platform.png
Diff not rendered.
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/mlflow-ui-model-uri.png
Diff not rendered.
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/model-deployer.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/model-registry.png
Diff not rendered.
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/neptune_charts.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/new_dashboard_rn_2.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/new_dashboard_rn_3.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/new_dashboard_rn_4.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/new_tenant.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/new_tenant_modal.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/one-click-deployment.gif
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/orchestrator.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/org_members.png
Diff not rendered.
1 change: 0 additions & 1 deletion docs/book/.gitbook/assets/oss-header.svg
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/overview.gif
Diff not rendered.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/pipelines_dashboard.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/pipelineversions.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/plugins_cli.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/quickstart.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/rag-and-zenml.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/rag_zenml_home.png
Diff not rendered.
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/readme_basic_pipeline.gif
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/readme_compute.gif
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/readme_integrations.gif
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/readme_mcp.gif
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/readme_simple_pipeline.gif
Diff not rendered.
Diff not rendered.
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/rest_api_step_1.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/rest_api_step_2.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/role_page.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/rosetta_terminal_1.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/rosetta_terminal_2.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/rosetta_terminal_3.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/rosetta_terminal_4.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/rosetta_terminal_5.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/rosetta_terminal_6.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/run_visualization.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/secret-scoping.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/seldon-model-deployer.gif
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/served-models-cli.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/share_dialog.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/stack-list.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/stack-wizard-aws-auth.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/stack.gif
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/stack_in_dashboard.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/starter-guide.png
Diff not rendered.
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/statuspage.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/step-operator.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/summarize.jpeg
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/tenant_roles_page.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/validation-result (1).png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/vpc_zenml.png
Diff not rendered.
Binary file not shown.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/zenml-cloud-form.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/zenml-cloud-overview.png
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/zenml-hero.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/zenml-up (1).gif
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/zenml-up.gif
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/zenml-why.png
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/zenml_deploy.png
Diff not rendered.
Binary file removed docs/book/.gitbook/assets/zenml_logo.png
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Diff not rendered.
Binary file removed docs/book/user-guide/.gitbook/assets/project-02.jpg
Diff not rendered.
Binary file removed docs/book/user-guide/.gitbook/assets/project-03.jpg
Diff not rendered.
Binary file removed docs/book/user-guide/.gitbook/assets/project-04.jpg
Diff not rendered.
Binary file removed docs/book/user-guide/.gitbook/assets/project-05.jpg
Diff not rendered.
Binary file removed docs/book/user-guide/.gitbook/assets/project-06.jpg
Diff not rendered.
Binary file removed docs/book/user-guide/.gitbook/assets/project-07.jpg
Diff not rendered.
Binary file removed docs/book/user-guide/.gitbook/assets/project-08.jpg
Diff not rendered.
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,7 @@ We log the results for our core Matryoshka dimensions as model metadata to ZenML

It's possible to visualize results in a few different ways in ZenML, but one easy option is just to output your chart as an `PIL.Image` object. (See our[documentation on more ways to visualize your results](https://docs.zenml.io/how-to/data-artifact-management/visualize-artifacts).) The rest the implementation of our `visualize_results` step is just simple `matplotlib` code to plot out the base model evaluation against the finetuned model evaluation. We represent the results as percentage values and horizontally stack the two sets to make comparison a little easier.

![Visualizing finetuned embeddings evaluation
results](../../../.gitbook/assets/finetuning-embeddings-visualization.png)
![Visualizing finetuned embeddings evaluation results](../../../.gitbook/assets/finetuning-embeddings-visualization.png)

We can see that our finetuned embeddings have improved the recall of our retrieval system across all of the dimensions, but the results are still not amazing. In a production setting, we would likely want to focus on improving the data being used for the embeddings training. In particular, we could consider stripping out some of the logs output from the documentation, and perhaps omit some pages which offer low signal for the retrieval task. This embeddings finetuning was run purely on the full set of synthetic data generated by`distilabel` and `gpt-4o`, so we wouldn't necessarily expect to see huge improvements out of the box, especially when the underlying data chunks are complex and contain multiple topics.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,7 @@ Our pipeline for finetuning the embeddings is relatively simple. We'll do the fo
* evaluate the base and finetuned embeddings
* visualize the results of the evaluation

![Embeddings finetuning pipeline with Sentence Transformers and
ZenML](../../../.gitbook/assets/rag-finetuning-embeddings-pipeline.png)
![Embeddings finetuning pipeline with Sentence Transformers and ZenML](../../../.gitbook/assets/rag-finetuning-embeddings-pipeline.png)

### Loading data

Expand Down
40 changes: 24 additions & 16 deletions scripts/find_orphaned_assets.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,8 +177,8 @@ def extract_asset_references(md_file):

def get_human_readable_size(size_bytes):
"""Convert bytes to a human-readable format (KB, MB, GB)."""
for unit in ['B', 'KB', 'MB', 'GB']:
if size_bytes < 1024.0 or unit == 'GB':
for unit in ["B", "KB", "MB", "GB"]:
if size_bytes < 1024.0 or unit == "GB":
break
size_bytes /= 1024.0
return f"{size_bytes:.2f} {unit}"
Expand Down Expand Up @@ -232,8 +232,10 @@ def print_default_format(orphaned_assets, total_assets):
"""Print in the default format optimized for deletion."""
total_size = calculate_total_size(orphaned_assets)
human_readable_size = get_human_readable_size(total_size)

print(f"Found {len(orphaned_assets)} orphaned assets out of {total_assets} total assets:")

print(
f"Found {len(orphaned_assets)} orphaned assets out of {total_assets} total assets:"
)
print(f"Total space that would be saved: {human_readable_size}")
print("\n# Orphaned Assets")
print("# Format: Files separated by spaces")
Expand Down Expand Up @@ -295,7 +297,7 @@ def print_readable_format(orphaned_assets, total_assets):
# Group orphaned assets by their .gitbook/assets directory
by_assets_dir = defaultdict(list)
sizes_by_dir = defaultdict(int)

for asset in orphaned_assets:
# Find the .gitbook/assets part of the path
path_parts = asset.split(os.path.sep)
Expand Down Expand Up @@ -324,36 +326,38 @@ def print_readable_format(orphaned_assets, total_assets):
else assets_dir
)
dir_size = get_human_readable_size(sizes_by_dir[assets_dir])
print(f"📁 {rel_dir}/.gitbook/assets/ ({len(files)} files, {dir_size})")
print(
f"📁 {rel_dir}/.gitbook/assets/ ({len(files)} files, {dir_size})"
)
print("─" * term_width)

# Group by extension within each directory
by_ext = defaultdict(list)
sizes_by_ext = defaultdict(int)

for asset in files:
ext = os.path.splitext(asset)[1].lower()
by_ext[ext].append(asset)
sizes_by_ext[ext] += os.path.getsize(asset)

for ext, ext_files in sorted(by_ext.items()):
ext_size = get_human_readable_size(sizes_by_ext[ext])
print(f" {ext} files ({len(ext_files)}, {ext_size})")

for i, file in enumerate(sorted(ext_files), 1):
filename = os.path.basename(file)
abs_file_path = os.path.abspath(file)
file_size = os.path.getsize(file) / 1024 # Size in KB

# Create a clickable link
clickable_link = f"file://{abs_file_path}"

# Print file with its clickable link and size
print(f" {i:2d}. {filename} ({file_size:.1f} KB)")
print(f" 👉 {clickable_link}")

print() # Extra line between extensions

print() # Extra line between directories


Expand Down Expand Up @@ -416,16 +420,20 @@ def main():
total_files = len(orphaned_assets)
total_size = calculate_total_size(orphaned_assets)
human_readable_size = get_human_readable_size(total_size)

confirm = input(
f"⚠️ WARNING: This will delete {total_files} image assets ({human_readable_size}). Type 'YES' to confirm: "
)

if confirm.strip() == "YES":
deleted_count, failed_count, deleted_size = delete_files(orphaned_assets)
deleted_count, failed_count, deleted_size = delete_files(
orphaned_assets
)
deleted_size_str = get_human_readable_size(deleted_size)

print(f"\n✅ Deleted {deleted_count} assets successfully, freeing up {deleted_size_str}.")
print(
f"\n✅ Deleted {deleted_count} assets successfully, freeing up {deleted_size_str}."
)
if failed_count > 0:
print(f"❌ Failed to delete {failed_count} assets.")
else:
Expand Down
Loading