Python Developer & Platform Specialist with 6+ years building high-throughput ETL pipelines, distributed data platforms, and cloud data warehouses that process millions of records at scale.
About Me
Data Engineer and Python Developer with 6+ years of production experience designing and operating high-throughput ETL pipelines, distributed ingestion systems, and cloud data warehouses.
At X10 Logistics, I built the DSF Data Platform from the ground up: 30+ interconnected systems spanning scraping, validation, marketplace integration, analytics, REST APIs, and reporting, processing 2M+ multilingual product listings daily with full CI/CD and Airflow orchestration.
Hands-on with Snowflake, BigQuery, Databricks, Apache Spark, and medallion lakehouse architecture, deployed across AWS and GCP. MSc in Business Intelligence with a thesis in synthetic data generation.
Work History
Work & Research
30+ production systems spanning data ingestion, validation, marketplace integration, analytics, and API services — built at X10 Logistics and for freelance clients.
Web data ingestion engine capturing 10,000+ product records daily from multiple sources. v2.0 rebuilt with modular spider architecture and proxy orchestration, cutting processing time by 40%. Spiders parse JSON embedded in marketplace scripts, rotate through 3 proxy providers via ScraperAPI, translate via DeepL, and feed a dual-database pipeline (MySQL + MongoDB) with Scrapyd deployment and AWS S3 photo storage.
Multi-stage validation and cleaning pipeline between raw scrape output and marketplace upload. v2.0 added ML-based anomaly detection and automated error recovery, achieving 99%+ data accuracy. Validates and enriches 14 part categories: deduplicates SKUs, extracts production years, matches manufacturers, detects headlight technology, and resolves part numbers against a 22 MB compatibility database. Master-slave MySQL with transactional rollback.
Synthetic data generation system for testing and model training across the DSF pipeline. Produces realistic product, pricing, and inventory records mirroring production distributions, enabling safe regression testing and ML model training without exposing real business data.
Sales pattern analysis system mining historical order data to surface trends and inform restocking decisions. Aggregates transaction history across multiple seller accounts, identifies seasonal demand patterns, and generates actionable restocking recommendations.
Returns analytics platform identifying root causes of product returns. Correlates return reasons with listing quality, product descriptions, and category attributes. Data-backed improvements drove a 15% reduction in return rate through targeted listing changes.
Market intelligence system aggregating competitor and marketplace data for pricing and positioning decisions. Pulls structured data from multiple marketplace APIs and scrapers, normalizes across categories, and surfaces pricing gaps and demand signals.
Automated competitive intelligence system tracking competitor pricing and stock availability in real-time. Monitors target competitor listings across multiple platforms, detects price changes and stockouts, and feeds downstream pricing strategy workflows.
Next-gen batch upload system handling 1,000+ files per batch with validation and error reporting. Manages OAuth2 lifecycle, batches 20-item API calls, and runs category-specific crons for price, quantity, and description sync. Delists out-of-stock items with 1,024 concurrent requests across multi-shop environments (X10, X102, ATS).
Full CRUD automation for eBay listings with real-time price and stock sync across multiple seller accounts. Handles listing creation, updates, revisions, and deletions via the eBay Trading and Sell APIs, with category mapping and condition handling per market.
Multi-account stock and dynamic pricing management system for new parts inventory. Rule-based pricing engine adjusts prices based on competition, margin targets, and stock levels. Dynamic pricing rules reduced overstock by 30%.
Automated product upload and synchronization system for the Allegro marketplace (Poland). Handles product creation, attribute mapping to Allegro category schemas, image upload, and ongoing price and stock sync across the full product catalog.
End-to-end integration for the Express-Teile.de platform covering listings, inventory management, and order processing. Handles product data ingestion, attribute normalization, listing publication, real-time stock updates, and order status synchronization.
Single orchestration system managing listing updates across all integrated marketplaces (eBay, Allegro, Express-Teile). Abstracts platform-specific API differences behind a unified update interface, enabling a single data change to propagate across all channels simultaneously.
eBay campaign automation with percentile-based customer segmentation. Automates Promoted Listings campaigns based on sales velocity, margin, and competitive positioning. v2.0 added cohort analysis and A/B test evaluation, improving campaign performance by 25%.
Real-time KPI dashboard monitoring business performance across all platforms. Aggregates sales, inventory, returns, and marketplace data into a unified view for leadership. Tracks daily revenue, listing health, fulfillment rates, and competitor price deltas.
RESTful product management API serving as the core data access layer for product information across the ecosystem. Handles CRUD operations, search and filtering, category management, and attribute enrichment for the full catalog.
Dedicated API for the new parts catalog providing search, filtering, and retrieval endpoints. Supports TecDoc-based compatibility filtering, OEM number lookup, and category-specific attribute queries for downstream marketplace integrations.
Inventory service for the new parts catalog providing real-time stock queries and update endpoints. Supports multi-warehouse stock aggregation, reservation handling, and low-stock alert triggers consumed by downstream systems.
Unified CRUD API handling multi-entity data operations across the platform. Provides a single consistent interface for create, read, update, and delete operations across products, orders, customers, and inventory entities with schema validation and audit logging.
Inventory orchestration layer synchronizing stock state across all downstream systems. Serves as the source of truth for available quantities, coordinates stock reservations between marketplace channels, and emits events to trigger replenishment workflows.
3-phase digital listing system (currently in Phase 3) streamlining product listing creation and publishing across platforms. Guides operators through data entry, image management, and cross-platform submission with validation at each phase before publication.
Internal product management web application for browsing, editing, and managing catalog data. Provides search, filtering, bulk editing, image management, and category assignment for the operations team, backed by the Products-API and Inventory-Management-API.
Full backend powering the Express-Teile.de storefront. Handles inventory availability, product data retrieval, order ingestion, and status updates. Designed for high availability with structured error handling and request validation.
Core backend infrastructure supporting the main business website. Serves product catalog browsing, customer inquiry handling, and dynamic content. Integrated with the inventory and product APIs for real-time availability data.
Extracts and validates millions of business leads from Google. Visits business websites to pull phone numbers (mobile and landline) and emails from home pages, contact pages, and job pages. Deployed on AWS EC2 with S3 storage; leads stored in MongoDB and filterable by keyword-based criteria.
Automated system monitoring multiple Facebook groups based on custom criteria. Filters posts matching specified rules and sends notifications via Discord, email, or other platforms. Supports public groups without login credentials, with fully customizable rules.
Price monitoring bots for Swedish retailers (Hemköp, Handla Willys) tracking product prices over time, identifying trends, and surfacing insights into market behavior and consumer demand to support data-driven stock and pricing decisions.
Monitors Amazon product prices across specific stores and sends real-time notifications whenever a tracked product drops in price. Supports configurable thresholds, multiple products, and multi-channel alerting.
Production-grade RAG system answering questions about Polish business law, tax setup, and B2B contracting, with cited source paragraphs on every answer. Built to avoid hallucination.
End-to-end generative synthesis pipeline for IoT sensor data in smart-home environments, addressing GDPR and ethical constraints. Trains deep generative models (CTGAN, TVAE, TimeGAN) and enforces post-generation privacy filtering. Evaluated across statistical fidelity, downstream utility, and privacy risk.
Python client for the German commercial register (Handelsregister) that queries, parses, and structures public company registry data for downstream business intelligence workflows.
High-throughput scraper targeting e-commerce platforms. Extracts structured product data including brand, pricing, discounts, and image URLs into clean JSON for BI consumption.
Reviews analysis pipeline for UK marketplaces. Extracts product review datasets, applies supervised learning for keyword and phrase analysis to identify themes (packaging, delivery, etc.), performs flavor identification from review text, and uses the Namsor API for gender-based demographic analysis.
Comparative analysis of e-commerce product categories showing how sales events (Black Friday, National Day, Easter, Christmas) impact product pricing. Includes data analysis in Jupyter Notebook with visualization graphs and a written report.
Android application using server-client socket architecture to create online classroom rooms. Teachers broadcast slides, PDFs, and media to connected students in real time, with server-side control over all clients.
Fiverr Services
Technical Stack
Academic & Certifications
Thesis: Synthetic Data Generation for Privacy-Aware Homecare Support. Implemented GANs, CTGAN, VAEs, and GTransformers to create synthetic sensor data for downstream predictive ML models.
Foundation in software engineering, database systems, networking, and computer science fundamentals.
Get In Touch
Open to Data Engineering roles, freelance pipeline projects, and cloud platform consulting. I respond within 24 hours.
hamzaraja983@gmail.com