Skip to content

addy999/onequery

Repository files navigation

OneQuery

GitHub License GitHub Last Commit

🔨 Note: This repository is still in development. Contributions and feedback are welcome!

Setup

  • Requirements: pip install -r requirements.txt
  • Install browser: python -m playwright install
    • This project uses Playwright to control the browser. You can install the browser of your choice using the command above.
  • Write your environment variables in a .env file (see .env.test)
  • Install OmniParser
    • For webpage analysis, we use the OmniParser model from Hugging Face. You'll need to host it via an API locally.

Examples

  • Finding issues on a github repo

Video Demo 1

  • Finding live events

Video Demo 2

Usage

General query with no source to start with

task = "Find 2 recent issues from PyTorch repository."

class IssueModel(BaseModel):
    date: str
    title: str
    author: str
    description: str

class OutputModel(BaseModel):
    issues: list[IssueModel]

scraper = WebScraper(task, None, OutputModel)
scraper.run()

If you know the URL

start_url = "https://in.bookmyshow.com/"
task = "Find 5 events happening in Bangalore this week."

class EventsModel(BaseModel):
    name: str
    date: str
    location: str

class OutputModel(BaseModel):
    events: list[EventsModel]

scraper = WebScraper(task, start_url, OutputModel)
scraper.run()

Serving with a REST API

Server:

pip install fastapi[all]
uvicorn server:app --reload

Client:

import requests

url = "http://0.0.0.0:8000/scrape"

payload = {
    "start_url": "http://example.com",
    "task": "Scrape the website for data",
    "schema": {
        "title": (str, ...),
        "description": (str, ...)
    }
}

response = requests.post(url, json=payload)

print(response.status_code)
print(response.json())

💡 Tip: For a hosted solution with a lightning fast Zig based browser, worldwide proxy support, and job queuing system, check out onequery.app.

Testing

In the works

Status

  • ✅ Basic functionality
  • 🛠️ Testing
  • 🛠️ Documentation

Architecture

(needs to be revised)

Flowchart

graph TD;
    A[Text Query] --> B[WebLLM];
    B --> C[Browser Instructions];
    C --> D[Browser Execution];
    D --> E[OmniParser];
    E --> F[Screenshot & Structured Info];
    F --> G[AI];
    C --> G;
    G --> H[JSON Output];
Loading

Stack

Alternatives

Releases

No releases published

Packages

No packages published

Languages