Back to all jobs
Nesso Labs logo

Web Scraping Engineer — European Public Procurement

Nesso Labs
IndonesiaIDR 17000k–21000k/moRemote1mo ago
Employment
Full-time

About the role

We're building the data backbone for European public procurement. Our platform aggregates tender data from 100+ e-procurement portals — each with its own quirks, anti-bot protections, and legacy HTML.

We're looking for a scraping engineer who can navigate this landscape: someone who's comfortable with headless browsers, knows how to handle sessions and CAPTCHAs, and won't panic when the same platform serves three different HTML layouts across pages.

What you'll do

  • Build and maintain async scrapers (Python + Playwright) against Italian and later European public procurement portals (Maggioli PortaleAppalti, ANAC, MePA, and others)

  • Handle real-world challenges: JSESSIONID session management, FriendlyCaptcha/Mosparo anti-bot, Cloudflare WAF, IP rotation with rate limit backoff

  • Parse Italian data formats — amounts (€ 1.234.567,89), dates (DD/MM/YYYY, textual), CIG/CUP identifiers with placeholder detection

  • Extract and process documents: PDF, .p7m (PKCS#7 signed), ZIP/7Z archives, with OCR fallback

  • Integrate scrapers into our Prefect orchestration pipeline with monitoring, alerting, and anomaly detection

  • Work with PostgreSQL, Supabase, Clickhouse, and S3 for dual-sink storage with upsert/idempotency patterns

What we're looking for

  • Strong async Python — you think in asyncio, not time.sleep()

  • Playwright or Selenium experience — you've intercepted XHR responses, handled SPAs, and debugged timing issues

  • Resilience mindset — retry with backoff, graceful degradation, circuit breakers. Your scraper doesn't crash at 3 AM.

  • Comfort with messy HTML — you can write a multi-strategy extractor that handles <th>/<td>, <dt>/<dd>, and <label>/<span> on the same site

  • Data parsing skills — Italian locale, date formats, CIG validation, document type detection

  • Bonus: experience with Italian PA (Pubblica Amministrazione) portals, ANAC/PVL datasets, or OCDS data formats

Tech stack
Python 3.11+ · Playwright · httpx · BeautifulSoup · Pydantic · SQLAlchemy 2.0 · PostgreSQL · Prefect · AWS S3 · Supabase

How we hire
No whiteboard algorithms. We'll send you a hands-on technical assessment: a mock procurement portal with real-world challenges. You build a scraper. We evaluate the code.

Perks & benefits

  • No Whiteboard

755,000+ hidden jobs like this

Nesso Labs and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.