Available for opportunities

Hamza Anwar
Data Engineer

Python Developer & Platform Specialist with 6+ years building high-throughput ETL pipelines, distributed data platforms, and cloud data warehouses that process millions of records at scale.

Szczecin, Poland 🇵🇱 EU Blue Card Spark · Airflow · Snowflake · BigQuery

About Me

Building production data
infrastructure at scale

Data Engineer and Python Developer with 6+ years of production experience designing and operating high-throughput ETL pipelines, distributed ingestion systems, and cloud data warehouses.

At X10 Logistics, I built the DSF Data Platform from the ground up: 30+ interconnected systems spanning scraping, validation, marketplace integration, analytics, REST APIs, and reporting, processing 2M+ multilingual product listings daily with full CI/CD and Airflow orchestration.

Hands-on with Snowflake, BigQuery, Databricks, Apache Spark, and medallion lakehouse architecture, deployed across AWS and GCP. MSc in Business Intelligence with a thesis in synthetic data generation.

🗓️
6+
Years of Experience
2M+
Records Processed Daily
🏗️
30+
Production Systems Built
🎓
MSc
Business Intelligence

Work History

Experience

Data Engineer & Python Developer Dec 2023 – Mar 2026
X10 Logistics sp. z o.o.
📍 Szczecin, Poland · Onsite
Data Platform & ETL Engineering
  • Architected end-to-end ETL pipelines in Python for product, inventory, and sales data, reliably ingesting and synchronising 2M+ multilingual product listings across multiple e-commerce platforms.
  • Implemented data validation, transformation, and normalisation logic (Pandas, NumPy) against relational schemas in PostgreSQL and MySQL, reducing downstream reporting errors.
  • Orchestrated scheduled pipeline workflows with Apache Airflow; diagnosed processing failures and applied root-cause fixes to maintain stable production delivery.
  • Deployed containerised pipeline environments using Docker and Jenkins CI/CD, enabling repeatable, auditable deployments.
  • Built distributed data processing workflows with Apache Spark and PySpark for large-scale transformation over datasets exceeding millions of records.
  • Integrated Snowflake & BigQuery with medallion architecture (Bronze / Silver / Gold) for reliable, incremental data processing.
Data Quality & Observability
  • Implemented structured logging and monitoring across all pipeline stages, proactively identifying degradation to sustain SLA compliance.
  • Delivered KPI dashboards and reporting pipelines supporting sales, logistics, and operations leadership.
  • Applied agile engineering practices: feature branching, pull request reviews, TDD, and iterative CI/CD via Jenkins and GitHub.
PythonAirflowSparkSnowflakeBigQueryDockerPostgreSQL
Python Developer Jul 2019 – Mar 2023
GulzarSoft
📍 Gujrat, Pakistan
ETL, Automation & Team Leadership
  • Designed and built modular, high-performance web scraping systems using Scrapy, Selenium, and BeautifulSoup to handle dynamic content, proxy rotation, and structured parsing workflows at scale.
  • Built and deployed automated ETL ingestion workflows across multiple industries with strong schema compliance standards.
  • Led a team of developers; established code review workflows and version control practices (Git/GitHub/GitLab).
  • Mentored junior developers on Python, Scrapy architecture, and ETL design, elevating team-wide technical quality.
PythonScrapySeleniumETLGitTeam Lead

Work & Research

Projects

30+ production systems spanning data ingestion, validation, marketplace integration, analytics, and API services — built at X10 Logistics and for freelance clients.

Private / NDA Open Source Academic
Data Infrastructure Core ingestion, validation, and data quality pipelines
🕷️
DSF-Scrapers
Automotive Parts Ingestion Engine · v1 + v2.0
Private / NDA

Web data ingestion engine capturing 10,000+ product records daily from multiple sources. v2.0 rebuilt with modular spider architecture and proxy orchestration, cutting processing time by 40%. Spiders parse JSON embedded in marketplace scripts, rotate through 3 proxy providers via ScraperAPI, translate via DeepL, and feed a dual-database pipeline (MySQL + MongoDB) with Scrapyd deployment and AWS S3 photo storage.

10K+ records/day 40% faster v2.0
PythonScrapyMySQLMongoDBAWS S3DeepLScraperAPIasyncio
🔬
DSF-Cleaners
Automotive Data Validation Pipeline · v1 + v2.0
Private / NDA

Multi-stage validation and cleaning pipeline between raw scrape output and marketplace upload. v2.0 added ML-based anomaly detection and automated error recovery, achieving 99%+ data accuracy. Validates and enriches 14 part categories: deduplicates SKUs, extracts production years, matches manufacturers, detects headlight technology, and resolves part numbers against a 22 MB compatibility database. Master-slave MySQL with transactional rollback.

99%+ data accuracy
PythonPandasNumPyscikit-learnMySQLMongoDB
🧬
DSF-NP-DataGeneration
Synthetic Data Pipeline for Testing & Training
Private / NDA

Synthetic data generation system for testing and model training across the DSF pipeline. Produces realistic product, pricing, and inventory records mirroring production distributions, enabling safe regression testing and ML model training without exposing real business data.

PythonPandasNumPyData GenerationETL
Analytics & Intelligence Business intelligence, market analysis, and competitive tracking
📈
DSF-SalesHistoryAnalyses
Sales Pattern Analysis System
Private / NDA

Sales pattern analysis system mining historical order data to surface trends and inform restocking decisions. Aggregates transaction history across multiple seller accounts, identifies seasonal demand patterns, and generates actionable restocking recommendations.

PythonPandasMatplotlibMySQLJupyterData Analysis
↩️
DSF-SalesReturns
Returns Analytics Platform
Private / NDA

Returns analytics platform identifying root causes of product returns. Correlates return reasons with listing quality, product descriptions, and category attributes. Data-backed improvements drove a 15% reduction in return rate through targeted listing changes.

15% return rate reduction
PythonPandasSQLMatplotlibData Analysis
📊
DSF-MarketplaceDataAnalyses
Market Intelligence System
Private / NDA

Market intelligence system aggregating competitor and marketplace data for pricing and positioning decisions. Pulls structured data from multiple marketplace APIs and scrapers, normalizes across categories, and surfaces pricing gaps and demand signals.

PythonScrapyPandasMySQLData Analysis
🔍
DSF-CompetitorScraping
Automated Competitive Intelligence
Private / NDA

Automated competitive intelligence system tracking competitor pricing and stock availability in real-time. Monitors target competitor listings across multiple platforms, detects price changes and stockouts, and feeds downstream pricing strategy workflows.

PythonScrapySeleniumMongoDBBeautifulSoup
Marketplace Integration Multi-platform listing automation and inventory synchronization
🚀
DSF-Uploaders 2.0
Next-Gen Batch Upload System
Private / NDA

Next-gen batch upload system handling 1,000+ files per batch with validation and error reporting. Manages OAuth2 lifecycle, batches 20-item API calls, and runs category-specific crons for price, quantity, and description sync. Delists out-of-stock items with 1,024 concurrent requests across multi-shop environments (X10, X102, ATS).

1,000+ files/batch 1,024 concurrent reqs
PythoneBay Sell APIMySQLMongoDBasyncio
🛒
DSF-eBayUpdators
eBay Full CRUD Automation
Private / NDA

Full CRUD automation for eBay listings with real-time price and stock sync across multiple seller accounts. Handles listing creation, updates, revisions, and deletions via the eBay Trading and Sell APIs, with category mapping and condition handling per market.

PythoneBay APIMySQLAutomation
📦
DSF-NP-StockPrices
Multi-Account Stock & Price Management
Private / NDA

Multi-account stock and dynamic pricing management system for new parts inventory. Rule-based pricing engine adjusts prices based on competition, margin targets, and stock levels. Dynamic pricing rules reduced overstock by 30%.

30% overstock reduction
PythonMySQLPricing EngineAutomation
🛍️
DSF-AllegroUploaders
Allegro Marketplace Automation
Private / NDA

Automated product upload and synchronization system for the Allegro marketplace (Poland). Handles product creation, attribute mapping to Allegro category schemas, image upload, and ongoing price and stock sync across the full product catalog.

PythonAllegro APIMySQLETLAutomation
⚙️
DSF-ExpressTeile
Express-Teile.de End-to-End Integration
Private / NDA

End-to-end integration for the Express-Teile.de platform covering listings, inventory management, and order processing. Handles product data ingestion, attribute normalization, listing publication, real-time stock updates, and order status synchronization.

PythonREST APIMySQLETLOrder Management
🔄
DSF-UpdatorSystems
Unified Multi-Platform Uploader
Private / NDA

Single orchestration system managing listing updates across all integrated marketplaces (eBay, Allegro, Express-Teile). Abstracts platform-specific API differences behind a unified update interface, enabling a single data change to propagate across all channels simultaneously.

PythoneBay APIAllegro APIMySQLMicroservices
Marketing & Reporting Campaign automation, KPI tracking, and business performance dashboards
📣
DSF-MarketingProject
eBay Campaign Automation · v1 + v2.0
Private / NDA

eBay campaign automation with percentile-based customer segmentation. Automates Promoted Listings campaigns based on sales velocity, margin, and competitive positioning. v2.0 added cohort analysis and A/B test evaluation, improving campaign performance by 25%.

25% campaign improvement
PythoneBay APIPandasData AnalysisCampaign Automation
📈
Reporting-Dashboard
Real-Time KPI Dashboard
Private / NDA

Real-time KPI dashboard monitoring business performance across all platforms. Aggregates sales, inventory, returns, and marketplace data into a unified view for leadership. Tracks daily revenue, listing health, fulfillment rates, and competitor price deltas.

PythonFlaskSQLJavaScriptBootstrap
API Services RESTful microservices powering product, inventory, and data operations
🔌
Products-API
Core Product Management Service
Private / NDA

RESTful product management API serving as the core data access layer for product information across the ecosystem. Handles CRUD operations, search and filtering, category management, and attribute enrichment for the full catalog.

PythonFastAPIMySQLRESTJSON
🔌
NP-Product-API
New Parts Catalog API
Private / NDA

Dedicated API for the new parts catalog providing search, filtering, and retrieval endpoints. Supports TecDoc-based compatibility filtering, OEM number lookup, and category-specific attribute queries for downstream marketplace integrations.

PythonFastAPIMySQLRESTTecDoc
📦
NP-Products-Inventory-API
Inventory Service — Real-Time Stock
Private / NDA

Inventory service for the new parts catalog providing real-time stock queries and update endpoints. Supports multi-warehouse stock aggregation, reservation handling, and low-stock alert triggers consumed by downstream systems.

PythonFastAPIMySQLRESTInventory
🔌
DATA-APIs-CRUD
Unified Multi-Entity CRUD API
Private / NDA

Unified CRUD API handling multi-entity data operations across the platform. Provides a single consistent interface for create, read, update, and delete operations across products, orders, customers, and inventory entities with schema validation and audit logging.

PythonFlaskMySQLRESTMicroservices
⚙️
INVENTORY-MANAGEMENT-API
Inventory Orchestration Layer
Private / NDA

Inventory orchestration layer synchronizing stock state across all downstream systems. Serves as the source of truth for available quantities, coordinates stock reservations between marketplace channels, and emits events to trigger replenishment workflows.

PythonFastAPIMySQLEvent-DrivenMicroservices
Web Applications Internal tools, storefronts, and operational interfaces
🌐
DSF-DigitalListings
3-Phase Digital Listing System
Private / NDA

3-phase digital listing system (currently in Phase 3) streamlining product listing creation and publishing across platforms. Guides operators through data entry, image management, and cross-platform submission with validation at each phase before publication.

PythonFlaskMySQLJavaScriptBootstrap
🗂️
PRODUCT-WEB-APP
Internal Product Management Interface
Private / NDA

Internal product management web application for browsing, editing, and managing catalog data. Provides search, filtering, bulk editing, image management, and category assignment for the operations team, backed by the Products-API and Inventory-Management-API.

PythonFlaskJavaScriptBootstrapMySQL
⚙️
Express-Teile Inventory Backend
Express-Teile.de Storefront Backend
Private / NDA

Full backend powering the Express-Teile.de storefront. Handles inventory availability, product data retrieval, order ingestion, and status updates. Designed for high availability with structured error handling and request validation.

PythonFastAPIMySQLREST APIE-Commerce
🌐
WEBSITE-BACKEND
Core Business Website Infrastructure
Private / NDA

Core backend infrastructure supporting the main business website. Serves product catalog browsing, customer inquiry handling, and dynamic content. Integrated with the inventory and product APIs for real-time availability data.

PythonFlaskMySQLREST APIHTML/CSSJavaScript
Freelance & Client Projects Data pipelines, scraping systems, and automation built for external clients
🤖
B2B Lead Generation Scraper
Scraping · AWS EC2 · MongoDB
Private / NDA

Extracts and validates millions of business leads from Google. Visits business websites to pull phone numbers (mobile and landline) and emails from home pages, contact pages, and job pages. Deployed on AWS EC2 with S3 storage; leads stored in MongoDB and filterable by keyword-based criteria.

PythonScrapySeleniumMongoDBAWS EC2AWS S3
👥
GroupWatch
Facebook Group Monitoring System
Private / NDA

Automated system monitoring multiple Facebook groups based on custom criteria. Filters posts matching specified rules and sends notifications via Discord, email, or other platforms. Supports public groups without login credentials, with fully customizable rules.

PythonSeleniumAutomationDiscord
📊
Swedish Retailer Price Monitoring
Hemköp · Handla Willys
Private / NDA

Price monitoring bots for Swedish retailers (Hemköp, Handla Willys) tracking product prices over time, identifying trends, and surfacing insights into market behavior and consumer demand to support data-driven stock and pricing decisions.

PythonScrapyData AnalysisTrend Detection
🔔
Amazon Price Alert
Real-Time Notification System
Private / NDA

Monitors Amazon product prices across specific stores and sends real-time notifications whenever a tracked product drops in price. Supports configurable thresholds, multiple products, and multi-channel alerting.

PythonWeb ScrapingNotifications
Open Source Publicly available tools and research implementations
⚖️
Polish Law RAG Assistant
Live on Streamlit
Open Source

Production-grade RAG system answering questions about Polish business law, tax setup, and B2B contracting, with cited source paragraphs on every answer. Built to avoid hallucination.

PythonRAGLLMStreamlit
Live Demo GitHub
🧬
Synthetic Data Generation for AAL
MSc Thesis · Högskolan Dalarna
Open Source

End-to-end generative synthesis pipeline for IoT sensor data in smart-home environments, addressing GDPR and ethical constraints. Trains deep generative models (CTGAN, TVAE, TimeGAN) and enforces post-generation privacy filtering. Evaluated across statistical fidelity, downstream utility, and privacy risk.

PythonCTGANTVAETimeGANPrivacyJupyter
GitHub
🏢
Handelsregister API
German Commercial Registry Client
Open Source

Python client for the German commercial register (Handelsregister) that queries, parses, and structures public company registry data for downstream business intelligence workflows.

PythonAPIETLBusiness Data
GitHub
🛍️
Omnilytics E-commerce Scraper
Product Data Extraction
Open Source

High-throughput scraper targeting e-commerce platforms. Extracts structured product data including brand, pricing, discounts, and image URLs into clean JSON for BI consumption.

PythonScrapySeleniumJSON
GitHub
Academic University coursework and research projects
🛒
Marketplace Review Analysis
UK Market Intelligence
Academic

Reviews analysis pipeline for UK marketplaces. Extracts product review datasets, applies supervised learning for keyword and phrase analysis to identify themes (packaging, delivery, etc.), performs flavor identification from review text, and uses the Namsor API for gender-based demographic analysis.

PythonNLPSupervised LearningNamsor APIData Analysis
📉
Sales Events & Consumer Behavior
Data Analysis · Jupyter
Academic

Comparative analysis of e-commerce product categories showing how sales events (Black Friday, National Day, Easter, Christmas) impact product pricing. Includes data analysis in Jupyter Notebook with visualization graphs and a written report.

PythonJupyterPandasMatplotlibData Analysis
📱
Share and Control Media
Android · Classroom Broadcasting App
Academic

Android application using server-client socket architecture to create online classroom rooms. Teachers broadcast slides, PDFs, and media to connected students in real time, with server-side control over all clients.

JavaAndroidSocketsAndroid Studio

Fiverr Services

Work With Me

View Fiverr Profile
Vetted Pro · 5.0
$50/hr · ~1hr response
BigQuery, Snowflake & Databricks Setup Data Warehouse

BigQuery, Snowflake & Databricks Setup

End-to-end cloud data warehouse design, implementation, and optimisation on BigQuery, Snowflake, or Databricks.

Starting from
$150
Order now
Scalable ETL Pipelines in Python Data ETLs

Scalable ETL Pipelines in Python

Design and build production-grade ETL pipelines, data workflows, and automation scripts using Python, Airflow, and Spark.

Starting from
$150
Order now
Databricks Unity Catalog & Medallion ETL Data Warehouse

Databricks Unity Catalog & Medallion ETL

Set up Databricks Unity Catalog, implement Bronze/Silver/Gold medallion layers, and build PySpark ETL pipelines at scale.

Starting from
$180
Order now
TecDoc Automotive Data Pipelines Data ETLs

TecDoc Automotive Data Pipelines

Build specialised automotive parts data pipelines with TecDoc compatibility sets, cross-reference mapping, and marketplace sync.

Starting from
$200
Order now
Production-Ready Web Scrapers Data Scraping

Production-Ready Web Scrapers

Build high-performance, proxy-rotated, anti-bot web scrapers and automated data extraction pipelines using Scrapy and Selenium.

Starting from
$60
Order now
FastAPI / Flask REST API + Database API & Integrations

FastAPI / Flask REST API + Database

Design and build robust REST APIs with FastAPI or Flask, complete with database schema design, authentication, and deployment.

Starting from
$120
Order now
KPI Dashboards & Data Visualizations Data Dashboards

KPI Dashboards & Data Visualizations

Create executive-ready KPI dashboards and data visualizations in Power BI, Tableau, or Looker Studio connected to your data sources.

Starting from
$80
Order now
AI Automation Workflows Automations & Agents

AI Automation Workflows

Build intelligent automation workflows and AI agents using n8n, Zapier, and Make to connect your tools and eliminate manual work.

Starting from
$45
Order now
Need something custom?
Book a consultation or message me directly. I'll scope your project within 1 hour.
Book Consultation Send Email

Technical Stack

Skills

🐍
Languages & Core
PythonSQLBashHTML/CSS/JavaScriptasyncioJupyter Lab
Data Engineering
Apache SparkPySparkApache KafkaApache AirflowDatabricksDelta LakeETL/ELTEvent-Driven Processing
☁️
Cloud & Warehouses
SnowflakeBigQueryAWS S3GCP Cloud RunAzure Data Lake
🗄️
Databases
PostgreSQLMySQLMongoDBSQLiteQuery OptimisationSchema Design
📊
Analytics & BI
PandasNumPyscikit-learnMatplotlibPower BITableauLooker Studio
🕷️
Scraping & Ingestion
ScrapySeleniumBeautifulSoupasyncioProxy ManagementHeadless Browsers
🔌
Backend & APIs
FastAPIFlaskREST APIsJSONeBay APIAllegro APITecDocBootstrap
🔧
DevOps & CI/CD
DockerJenkinsGitGitHubGitLabCI/CD Pipelines
🏗️
Architecture
Medallion (Bronze/Silver/Gold)Data LakehouseMicroservicesEvent-DrivenETL/ELT Patterns
🤖
AI & Automation
n8nMake.comZapierRAG PipelinesLLM IntegrationAI AgentsLangChainVector DBsPrompt Engineering

Academic & Certifications

Education

🎓

M.Sc. Business Intelligence

Högskolan Dalarna, Sweden
Jan 2023 – Jan 2024

Thesis: Synthetic Data Generation for Privacy-Aware Homecare Support. Implemented GANs, CTGAN, VAEs, and GTransformers to create synthetic sensor data for downstream predictive ML models.

🏛️

B.Sc. Information Technology

University of Gujrat, Pakistan
Sep 2015 – Jul 2019

Foundation in software engineering, database systems, networking, and computer science fundamentals.

Certifications

🏅
Data Engineering Essentials
IBM
Nov 2025
🏅
ETL and Data Pipelines with Shell, Airflow & Kafka
IBM
Nov 2025
🏅
Introduction to PySpark
DataCamp
Nov 2025
🏅
Microsoft Office Specialist (Word / Excel / PowerPoint 2013)
Microsoft
Dec 2018

Get In Touch

Let's build something
great together

Open to Data Engineering roles, freelance pipeline projects, and cloud platform consulting. I respond within 24 hours.

hamzaraja983@gmail.com
LinkedIn GitHub 📞 +48 505 687 830