# WebScraper Development Guidelines
## Project Overview
WebScraper is a browser automation tool that uses WebSocket communication between a Python backend and a browser extension to enable automated web interactions.
## Critical Requirements
### Extension Requirements
- Content Script MUST be used for page interactions
- Eval MUST be enabled for dynamic script execution
- Content Script should be injected at document_start
- All script execution must happen through the Content Script
### JavaScript Execution via Chrome Extension Guidelines
1. Browser Context Rules:
- All JS files in `/commands/js` must be pure browser JavaScript
- No Node.js/CommonJS/ES modules (no require/import/export)
- No external dependencies that need bundling
- Code must be self-contained and executable via eval()
- Return values must be JSON-serializable objects
2. Code Structure:
- Each file should contain all needed functions
- Main execution should be at the end of file
- Use direct return statements (no Promises/async)
- Avoid global scope pollution
- Use IIFE if needed for scope isolation
3. Return Format:
- Return plain objects only
- No class instances or complex types
- No DOM elements or functions
- No circular references
- Ensure all data is serializable
4. Browser APIs:
- Only use standard browser APIs
- No extension-specific APIs in JS files
- Handle API errors gracefully
- Fallback for unsupported features
- Test in isolated context
5. Security Considerations:
- No sensitive data in returned objects
- Sanitize all input/output
- No external script loading
- No XSS-vulnerable constructs
- Follow CSP restrictions
### Manifest Requirements
- Maintain the following content_security_policy in manifest.json:
```json
"content_security_policy": {
"extension_pages": "script-src 'self' 'wasm-unsafe-eval'; object-src 'self'",
"sandbox": "sandbox allow-scripts allow-forms allow-popups allow-modals; script-src 'self' 'unsafe-inline' 'unsafe-eval'; child-src 'self';"
}
```
### Architecture Principles
- Python backend should maintain primary control
- Extension serves as a Remote Access Tool (RAT)
- Maximize control delegation to Python backend
- WebSocket communication should be reliable and secure
### Development Guidelines
1. Iterative Development:
- Focus on adding features or careful refactoring
- Complete codebase scan before refactoring
- Ensure changes maintain existing functionality
2. Code Quality:
- Follow type-safe practices
- Maintain proper error handling
- Keep WebSocket communication robust
- Document all changes thoroughly
3. Security:
- Maintain CSP headers for script execution
- Keep content scripts in isolated world
- Validate all commands from Python backend
### Agent Responsibilities
1. Proactive Execution:
- Minimize user intervention
- Offer to run necessary commands
- Handle build and testing automatically
- Provide clear success/failure feedback
2. Code Review:
- Scan entire codebase before changes
- Verify manifest.json integrity
- Check content script functionality
- Validate WebSocket communication
3. Error Resolution:
- Address root causes
- Maintain security requirements
- Preserve architecture principles
- Document solutions thoroughly
## Project Setup and Running
### WebScraperExtension Setup
1. Build Requirements:
- Node.js and Bun package manager
- TypeScript for type checking
- Webpack for bundling
- Tailwind CSS for styling
2. Build Process:
- Clean dist directory before each build
- Build content script separately to maintain isolation
- Copy static assets including manifest.json
- Verify CSP rules are properly copied
3. Extension Loading:
- Load as unpacked extension in Chrome/Brave
- Verify background service worker starts
- Check content script injection
- Validate WebSocket connection
4. Common Commands:
```bash
# Clean and build extension
rm -rf dist && mkdir dist && bun run build
# Format code
bun run format
# Run type checking
bun run typecheck
```
### WebScraperPy Setup
1. Environment Setup:
- Python 3.12+ required
- Virtual environment recommended
- uv package manager for dependencies
- Development mode installation
2. Running Process:
- Start Python backend first
- Wait for WebSocket server to start
- Verify extension connection
- Handle command execution
3. Testing Flow:
- Run conversation extraction tests
- Verify WebSocket communication
- Check script execution
- Validate error handling
4. Common Commands:
```bash
# Setup virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install package
uv pip install -e .
# Run tests
python test_conversation.py
# Format code
black . && ruff .
```
### Running Order
1. Build extension:
```bash
cd WebScraperExtension
bun run build
```
2. Load extension in browser:
- Navigate to chrome://extensions
- Enable Developer Mode
- Load unpacked from dist directory
3. Start Python backend:
```bash
cd WebScraperPy
python example_extension.py
```
4. Verify Connection:
- Check extension background page console
- Verify WebSocket connection in Python logs
- Test basic navigation command
## Python Development Rules
### Critical Requirements
- Use Python 3.12+ features
- Maintain async/await patterns
- Handle WebSocket timeouts
- Implement proper error handling
### Code Standards
- Follow PEP 8 style guide
- Use Python typing with all new code
- Document all classes and methods
- Keep WebSocket handlers and browser commands single-purpose
- Maximum line length: 88 characters (black formatter)
#### 2. Project Structure
```
WebScraper/
├── WebScraperPy/ # Python backend
│ ├── webscraper/
│ │ ├── browser/ # Browser implementations
│ │ ├── core/ # Core interfaces and types
│ │ └── utils/ # Shared utilities
│ ├── tests/ # Test suite
│ └── example_*.py # Example scripts
│
└── WebScraperExtension/ # Browser extension
└── src/ # Extension source code
```
### Development Guidelines
#### Python Standards
- Python version: ^3.12
- Use uv for dependency management
- Use websockets for WebSocket communication
- Use pydantic for message validation
- Use loguru for logging
- Follow black formatting rules
#### WebSocket Communication
- All messages must use ExtensionMessage format
- Handle connection errors gracefully
- Implement proper message validation
- Log all WebSocket events for debugging
#### Browser Automation
- Implement commands through ExtensionBrowser class
- Follow the BrowserInterface protocol
- Handle browser capabilities consistently
- Validate all browser commands
#### Error Handling
- Use custom exceptions for WebSocket errors
- Provide context in error messages
- Log connection and command errors
- Handle browser disconnections gracefully
#### Documentation
- Document all message types
- Document browser capabilities
- Provide WebSocket connection examples
- Include browser command examples
#### Environment Configuration
- WebSocket host and port configuration
- Browser extension settings
- Debug mode settings
- Logging configuration
### Development Workflow
1. Use uv for dependency management
2. Test WebSocket communication
3. Validate message formats
4. Format code with black
5. Sort imports with isort
6. Update documentation
### Important Commands
```bash
# Install dependencies
uv pip install -e .
# Run example script
python example_extension.py
# Format code
uv python -m black .
uv python -m isort .
# Lint code
uv python -m flake8
```
bun
css
html
java
javascript
python
rest-api
tailwindcss
+3 more
First Time Repository
Python
Languages:
CSS: 3.1KB
HTML: 1.4KB
JavaScript: 14.5KB
Python: 47.5KB
TypeScript: 39.8KB
Created: 1/22/2025
Updated: 1/22/2025