Skip to content

Add multi-provider support with OpenAI integration #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions PROVIDER_DEBUGGING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Grunty AI Multi-Provider Debugging Report

## Issue Summary
The Grunty AI application was experiencing problems with its multi-provider support, particularly when switching between Anthropic and OpenAI providers. The issues included:

1. Error handling during provider switching
2. Lack of proper error feedback to users
3. Initialization issues with the OpenAI provider
4. Missing log functionality in the UI

## Implemented Fixes

### 1. Enhanced Error Logging
- Added detailed logging throughout the application with file names and line numbers
- Added console logging for immediate feedback during development
- Added stack trace logging for better debugging
- Improved log formatting for better readability

### 2. Improved Provider Initialization
- Added proper initialization checks in the OpenAI provider
- Added verification of API key availability
- Added API test call during initialization to verify connectivity
- Better error handling during provider creation and initialization

### 3. Enhanced Provider Switching
- Added more robust provider switching logic in the store
- Only recreate provider instances when necessary
- Proper error handling and recovery during provider switching
- Added user feedback through error dialogs when provider switching fails

### 4. OpenAI Provider Improvements
- Implemented proper computer control support
- Fixed message handling for the OpenAI API responses
- Added robust error handling for tool calls
- Improved response handling for different message formats

### 5. UI Improvements
- Added missing log method to MainWindow class
- Improved error message display in the UI
- Added better user feedback during provider operations

### 6. Dependency Management
- Better handling of optional dependencies
- Clear error messages when required packages are missing
- Graceful degradation when non-essential packages are unavailable

## Configuration
The application requires proper configuration in a `.env` file:

```
ANTHROPIC_API_KEY=your_anthropic_key
OPENAI_API_KEY=your_openai_key
DEFAULT_AI_PROVIDER=anthropic
```

## Testing

A new test script `test_providers.py` has been created to validate the provider functionality independently of the main application. This script tests:
- Anthropic provider creation and initialization
- OpenAI provider creation and initialization
- Provider manager functionality

All tests are passing, confirming that both providers are working correctly.

## Recommendations for Future Work

1. **Comprehensive Error Handling**: Add more specific error checks for different API errors
2. **Provider Configuration UI**: Add a dedicated settings page for provider configuration
3. **API Key Management**: Implement secure storage and management of API keys
4. **Automated Testing**: Expand the test coverage to include more complex scenarios
5. **New Providers**: Create a template for adding new AI providers easily

## Conclusion

The multi-provider support in Grunty AI is now working correctly. Users can switch between Anthropic and OpenAI providers with proper error handling and feedback. The application is more robust and user-friendly.
33 changes: 28 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# 👨🏽‍💻 Grunty

Self-hosted desktop app to have AI control your computer, powered by the new Claude [computer use](https://www.anthropic.com/news/3-5-models-and-computer-use) capability. Allow Claude to take over your laptop and do your tasks for you (or at least attempt to, lol). Written in Python, using PyQt.
Self-hosted desktop app to have AI control your computer, powered by the Claude [computer use](https://www.anthropic.com/news/3-5-models-and-computer-use) capability and OpenAI's GPT models. Allow AI to take over your laptop and do your tasks for you (or at least attempt to, lol). Written in Python, using PyQt.

## Demo
Here, I asked it to use [vim](https://vim.rtorr.com/) to create a game in Python, run it, and play it.
Expand All @@ -15,31 +15,51 @@ Video was sped up 8x btw. [Computer use](https://www.anthropic.com/news/3-5-mode

2. **Tread Lightly** - If it wipes your computer, sends weird emails, or orders 100 pizzas... that's on you.

Anthropic can see your screen through screenshots during actions. Hide sensitive information or private stuff.
AI providers can see your screen through screenshots during actions. Hide sensitive information or private stuff.

## 🎯 Features
- Literally ask AI to do ANYTHING on your computer that you do with a mouse and keyboard. Browse the web, write code, blah blah.
- **Multiple AI providers support**: Switch between Anthropic Claude and OpenAI models
- **Model selection**: Choose from various models for each provider
- **Theme toggling**: Light/Dark mode support
- **System tray integration**: Minimize to tray and run in background
- **Optional voice control**: Experimental voice input and text-to-speech support

# 💻 Platforms
- Anything you can run Python on: MacOS, Windows, Linux, etc.

## 🛠️ Setup

Get an Anthropic API key [here]([https://console.anthropic.com/keys](https://console.anthropic.com/dashboard)).
Get an Anthropic API key [here](https://console.anthropic.com/dashboard) and/or an OpenAI API key [here](https://platform.openai.com/api-keys).

```bash
# Python 3.10+ recommended
python -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
pip install -r requirements.txt

# Add API key to .env
# Add API keys to .env
echo "ANTHROPIC_API_KEY=your-key-here" > .env
echo "OPENAI_API_KEY=your-key-here" >> .env
echo "DEFAULT_AI_PROVIDER=anthropic" >> .env # or "openai"

# Run
python run.py
```

## 🧠 Supported AI Providers and Models

### Anthropic
- Claude 3.5 Sonnet
- Claude 3 Opus
- Claude 3 Sonnet
- Claude 3 Haiku

### OpenAI
- GPT-4o
- GPT-4 Turbo
- GPT-4

## 🔑 Productivity Keybindings
- `Ctrl + Enter`: Execute the current instruction
- `Ctrl + C`: Stop the current agent action
Expand All @@ -50,10 +70,13 @@ python run.py
- Claude really loves Firefox. You might want to install it for better UI detection and accurate mouse clicks.
- Be specific and explicit, help it out a bit
- Always monitor the agent's actions
- Different models have different capabilities for computer control - experiment to find the best one for your tasks

## 🐛 Known Issues

- Sometimes, it doesn't take a screenshot to validate that the input is selected, and types stuff in the wrong place.. Press CMD+C to end the action when this happens, and quit and restart the agent. I'm working on a fix.
- Sometimes, the AI doesn't take a screenshot to validate that the input is selected, and types stuff in the wrong place. Press CMD+C to end the action when this happens, and quit and restart the agent.
- Not all models support full computer control with the same level of capability
- Voice control is experimental and may not work reliably on all platforms

## 🤝 Contributing

Expand Down
18 changes: 12 additions & 6 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
# Core dependencies
PyQt6
pyautogui
requests
anthropic
python-dotenv
pillow
numpy
qtawesome
SpeechRecognition
pyttsx3
keyboard
pyaudio
requests

# AI Provider dependencies
anthropic>=0.15.0 # Required for Anthropic Claude
openai>=1.17.0 # Optional for OpenAI support

# Voice control dependencies (optional)
SpeechRecognition # Optional for voice input
pyttsx3 # Optional for text-to-speech
pyaudio # Optional for voice recording
keyboard # For keyboard shortcuts
172 changes: 172 additions & 0 deletions src/ai_providers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
import os
import logging
from abc import ABC, abstractmethod
from typing import List, Dict, Any, Optional
from dotenv import load_dotenv

logger = logging.getLogger(__name__)

class AIProvider(ABC):
"""Base abstract class for AI providers that can control the computer."""

def __init__(self, api_key: Optional[str] = None):
self.api_key = api_key

@abstractmethod
def initialize(self) -> bool:
"""Initialize the client with API key and any needed setup.
Returns True if successful, False otherwise."""
pass

@abstractmethod
def get_next_action(self, run_history: List[Dict[str, Any]]) -> Any:
"""Get the next action from the AI based on the conversation history.

Args:
run_history: List of conversation messages.

Returns:
Response object from the AI provider.
"""
pass

@abstractmethod
def extract_action(self, response: Any) -> Dict[str, Any]:
"""Extract the action from the AI response.

Args:
response: Response object from the AI provider.

Returns:
Dict with the parsed action.
"""
pass

@abstractmethod
def display_assistant_message(self, message: Any, update_callback: callable) -> None:
"""Format and display the assistant's message.

Args:
message: The message from the assistant.
update_callback: Callback function to update the UI with the message.
"""
pass

@abstractmethod
def get_prompt_for_model(self, model_id: str) -> str:
"""Get the prompt formatted for the specific model.

Args:
model_id: The model ID to get the prompt for.

Returns:
Formatted prompt string.
"""
pass

@staticmethod
def get_available_models() -> List[Dict[str, str]]:
"""Get a list of available models for this provider.

Returns:
List of dictionaries with model information.
"""
return []

@staticmethod
def default_model() -> str:
"""Get the default model ID for this provider.

Returns:
Default model ID string.
"""
return ""

# Manager class to handle multiple AI providers
class AIProviderManager:
"""Manager for different AI provider integrations."""

PROVIDERS = {
"anthropic": "AnthropicProvider",
"openai": "OpenAIProvider"
# Add more providers here as they are implemented
}

@staticmethod
def get_provider_names() -> List[str]:
"""Get a list of available provider names.

Returns:
List of provider name strings.
"""
return list(AIProviderManager.PROVIDERS.keys())

@staticmethod
def create_provider(provider_name: str, **kwargs) -> Optional[AIProvider]:
"""Factory method to create an AI provider.

Args:
provider_name: Name of the provider to create.
**kwargs: Additional arguments to pass to the provider constructor.

Returns:
AIProvider instance or None if creation failed.
"""
logger.info(f"Creating AI provider: {provider_name} with kwargs: {kwargs}")

# Dynamically import providers without circular imports
if provider_name == "anthropic":
try:
from .anthropic_provider import AnthropicProvider
provider = AnthropicProvider(**kwargs)
success = provider.initialize()
if success:
logger.info(f"Successfully created and initialized AnthropicProvider")
return provider
else:
logger.error(f"Failed to initialize AnthropicProvider")
return None
except ImportError as e:
logger.error(f"Failed to import AnthropicProvider: {str(e)}")
return None
except Exception as e:
import traceback
logger.error(f"Error creating AnthropicProvider: {str(e)}\n{traceback.format_exc()}")
return None
elif provider_name == "openai":
try:
# First check if openai package is installed
try:
import openai
logger.info("OpenAI package found")
except ImportError as e:
logger.error(f"OpenAI package not installed: {str(e)}")
return None

# Then try to import our provider
from .openai_provider import OpenAIProvider
logger.info("Creating OpenAIProvider instance")
provider = OpenAIProvider(**kwargs)

# Initialize the provider
logger.info("Initializing OpenAIProvider")
success = provider.initialize()

if success:
logger.info("Successfully created and initialized OpenAIProvider")
return provider
else:
logger.error("Failed to initialize OpenAIProvider")
return None
except ImportError as e:
logger.error(f"Failed to import OpenAIProvider: {str(e)}")
return None
except Exception as e:
import traceback
logger.error(f"Error creating OpenAIProvider: {str(e)}\n{traceback.format_exc()}")
return None

# Add more provider imports here as they are implemented

logger.error(f"Unknown provider name: {provider_name}")
return None
Loading