ronaldeddings WebScraper .cursorrules file for Python

# WebScraper Development Guidelines

## Project Overview
WebScraper is a browser automation tool that uses WebSocket communication between a Python backend and a browser extension to enable automated web interactions.

## Critical Requirements

### Extension Requirements
- Content Script MUST be used for page interactions
- Eval MUST be enabled for dynamic script execution
- Content Script should be injected at document_start
- All script execution must happen through the Content Script

### JavaScript Execution via Chrome Extension Guidelines
1. Browser Context Rules:
   - All JS files in `/commands/js` must be pure browser JavaScript
   - No Node.js/CommonJS/ES modules (no require/import/export)
   - No external dependencies that need bundling
   - Code must be self-contained and executable via eval()
   - Return values must be JSON-serializable objects

2. Code Structure:
   - Each file should contain all needed functions
   - Main execution should be at the end of file
   - Use direct return statements (no Promises/async)
   - Avoid global scope pollution
   - Use IIFE if needed for scope isolation

3. Return Format:
   - Return plain objects only
   - No class instances or complex types
   - No DOM elements or functions
   - No circular references
   - Ensure all data is serializable

4. Browser APIs:
   - Only use standard browser APIs
   - No extension-specific APIs in JS files
   - Handle API errors gracefully
   - Fallback for unsupported features
   - Test in isolated context

5. Security Considerations:
   - No sensitive data in returned objects
   - Sanitize all input/output
   - No external script loading
   - No XSS-vulnerable constructs
   - Follow CSP restrictions

### Manifest Requirements
- Maintain the following content_security_policy in manifest.json:
```json
"content_security_policy": {
  "extension_pages": "script-src 'self' 'wasm-unsafe-eval'; object-src 'self'",
  "sandbox": "sandbox allow-scripts allow-forms allow-popups allow-modals; script-src 'self' 'unsafe-inline' 'unsafe-eval'; child-src 'self';"
}
```

### Architecture Principles
- Python backend should maintain primary control
- Extension serves as a Remote Access Tool (RAT)
- Maximize control delegation to Python backend
- WebSocket communication should be reliable and secure

### Development Guidelines
1. Iterative Development:
   - Focus on adding features or careful refactoring
   - Complete codebase scan before refactoring
   - Ensure changes maintain existing functionality

2. Code Quality:
   - Follow type-safe practices
   - Maintain proper error handling
   - Keep WebSocket communication robust
   - Document all changes thoroughly

3. Security:
   - Maintain CSP headers for script execution
   - Keep content scripts in isolated world
   - Validate all commands from Python backend

### Agent Responsibilities
1. Proactive Execution:
   - Minimize user intervention
   - Offer to run necessary commands
   - Handle build and testing automatically
   - Provide clear success/failure feedback

2. Code Review:
   - Scan entire codebase before changes
   - Verify manifest.json integrity
   - Check content script functionality
   - Validate WebSocket communication

3. Error Resolution:
   - Address root causes
   - Maintain security requirements
   - Preserve architecture principles
   - Document solutions thoroughly

## Project Setup and Running

### WebScraperExtension Setup
1. Build Requirements:
   - Node.js and Bun package manager
   - TypeScript for type checking
   - Webpack for bundling
   - Tailwind CSS for styling

2. Build Process:
   - Clean dist directory before each build
   - Build content script separately to maintain isolation
   - Copy static assets including manifest.json
   - Verify CSP rules are properly copied

3. Extension Loading:
   - Load as unpacked extension in Chrome/Brave
   - Verify background service worker starts
   - Check content script injection
   - Validate WebSocket connection

4. Common Commands:
```bash
# Clean and build extension
rm -rf dist && mkdir dist && bun run build

# Format code
bun run format

# Run type checking
bun run typecheck
```

### WebScraperPy Setup
1. Environment Setup:
   - Python 3.12+ required
   - Virtual environment recommended
   - uv package manager for dependencies
   - Development mode installation

2. Running Process:
   - Start Python backend first
   - Wait for WebSocket server to start
   - Verify extension connection
   - Handle command execution

3. Testing Flow:
   - Run conversation extraction tests
   - Verify WebSocket communication
   - Check script execution
   - Validate error handling

4. Common Commands:
```bash
# Setup virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install package
uv pip install -e .

# Run tests
python test_conversation.py

# Format code
black . && ruff .
```

### Running Order
1. Build extension:
   ```bash
   cd WebScraperExtension
   bun run build
   ```

2. Load extension in browser:
   - Navigate to chrome://extensions
   - Enable Developer Mode
   - Load unpacked from dist directory

3. Start Python backend:
   ```bash
   cd WebScraperPy
   python example_extension.py
   ```

4. Verify Connection:
   - Check extension background page console
   - Verify WebSocket connection in Python logs
   - Test basic navigation command

## Python Development Rules

### Critical Requirements
- Use Python 3.12+ features
- Maintain async/await patterns
- Handle WebSocket timeouts
- Implement proper error handling

### Code Standards
- Follow PEP 8 style guide
- Use Python typing with all new code
- Document all classes and methods
- Keep WebSocket handlers and browser commands single-purpose
- Maximum line length: 88 characters (black formatter)

#### 2. Project Structure
```
WebScraper/
├── WebScraperPy/           # Python backend
│   ├── webscraper/        
│   │   ├── browser/       # Browser implementations
│   │   ├── core/          # Core interfaces and types
│   │   └── utils/         # Shared utilities
│   ├── tests/             # Test suite
│   └── example_*.py       # Example scripts
│
└── WebScraperExtension/    # Browser extension
    └── src/               # Extension source code
```

### Development Guidelines

#### Python Standards
- Python version: ^3.12
- Use uv for dependency management
- Use websockets for WebSocket communication
- Use pydantic for message validation
- Use loguru for logging
- Follow black formatting rules

#### WebSocket Communication
- All messages must use ExtensionMessage format
- Handle connection errors gracefully
- Implement proper message validation
- Log all WebSocket events for debugging

#### Browser Automation
- Implement commands through ExtensionBrowser class
- Follow the BrowserInterface protocol
- Handle browser capabilities consistently
- Validate all browser commands

#### Error Handling
- Use custom exceptions for WebSocket errors
- Provide context in error messages
- Log connection and command errors
- Handle browser disconnections gracefully

#### Documentation
- Document all message types
- Document browser capabilities
- Provide WebSocket connection examples
- Include browser command examples

#### Environment Configuration
- WebSocket host and port configuration
- Browser extension settings
- Debug mode settings
- Logging configuration

### Development Workflow
1. Use uv for dependency management
2. Test WebSocket communication
3. Validate message formats
4. Format code with black
5. Sort imports with isort
6. Update documentation

### Important Commands
```bash
# Install dependencies
uv pip install -e .

# Run example script
python example_extension.py

# Format code
uv python -m black .
uv python -m isort .

# Lint code
uv python -m flake8
```

bun
css
html
java
javascript
python
rest-api
tailwindcss
+3 more

First Time Repository

Python

Languages:

CSS: 3.1KB
HTML: 1.4KB
JavaScript: 14.5KB
Python: 47.5KB
TypeScript: 39.8KB
Created: 1/22/2025
Updated: 1/22/2025

All Repositories (1)