Firecrawl-style 端点(scrape/map/crawl/extract/batch)+ smart-crawler 完整数据契约(ProductData / DataSourceInfo)。同一把 Bearer key,多种调用方式自由选。
https://smartcrawler.io/api/v2(旧 v1 仍在 /api/v1)# 方式 A · Bearer(Firecrawl 兼容) curl -H "Authorization: Bearer sck_ikuBVCAjAKygdAxu8_DNSDc9iOJkgXMY7jBf5ceMmlw" \ "https://smartcrawler.io/api/v2/sources" # 方式 B · X-API-Key(旧风格也支持) curl -H "X-API-Key: sck_..." "https://smartcrawler.io/api/v2/sources"
https://smartcrawler.io/api/v2,API key 换成我们的 sck_,scrape / map / crawl 三个端点 schema 完全一致。
# Request { "url": "https://www.songmics.com/products/sol-3782", "formats": ["markdown", "structured"], "only_main_content": true, "timeout": 30000 } # Response { "success": true, "url": "https://www.songmics.com/products/sol-3782", "crawl_url": "https://www.songmics.com/products/sol-3782", "site": "songmics_us", "data": { // ProductData schema · 见 §4 "sku": "SOL-3782", "title": "电动升降桌 48英寸", "sale_price": 189.99, "currency": "USD", "ratings": 4.6, "review_count": 412, "image_urls": ["https://cdn..."], "product_url": "https://www.songmics.com/products/sol-3782", "site_url": "https://www.songmics.com/", "crawled_at": "2026-05-24T10:30:00", "confidence": 0.97 }, "markdown": "# 电动升降桌 48英寸\n\nPrice: 189.99 USD...", "scrape_id": "scr_5191fa07f09641bd", "credits_used": 1 }
# Request { "url": "https://www.songmics.com/", "limit": 1000, "search": "desk" // 可选 } # Response { "success": true, "url": "https://www.songmics.com/", "site": "songmics_us", "links": [ "https://www.songmics.com/products/sol-3782", "https://www.songmics.com/products/sol-3783" ], "count": 100, "credits_used": 1 }
# Request { "url": "https://www.songmics.com/", "limit": 1000, "include_paths": ["^/products/"], "max_depth": 2 } # Response · 立即返 { "success": true, "job_id": 730, "status": "pending", "site": "songmics_us", "crawl_url": "https://www.songmics.com/", "poll_url": "/api/v2/crawl/730", "credits_used": 1000 }
{
"success": true,
"job_id": 730,
"status": "success", // pending / running / success / failed
"site": "songmics_us",
"crawl_url": "https://www.songmics.com/",
"total": 4202,
"products_count": 4202,
"duration_sec": 42.3,
"data": [ // ProductData × 100(首批) ]
}
{
"urls": ["https://a.com/p/1", "https://b.com/p/2"],
"formats": ["structured"],
"webhook": "https://yourapp.com/cb" // 可选
}
{
"urls": ["https://..."],
"schema": {
"price": { "type": "number" },
"in_stock": { "type": "boolean" },
"variant_count": { "type": "integer" }
},
"prompt": "Extract pricing and stock info"
}
[
{
"site": "vidaxl_de",
"crawl_url": "https://www.vidaxl.de/",
"brand": "Vidaxl",
"country": "DE",
"platform": "vidaxl",
"sku_count": 5000,
"coverage_pct": 100.0,
"status": "healthy",
"last_crawled": "2026-05-24T06:05:43",
"proxy_tier": "residential",
"anti_bot_level": 4 // 1-5
},
...
]
所有 product 接口返回统一 14 字段。可序列化为 JSON / 入 DataFrame / 直接喂 LLM 上下文。
| 字段 | 类型 | 含义 | 示例 |
|---|---|---|---|
site | string | 内部站点代号 | songmics_us |
site_url | url | 站点根 URL(爬取域名) | https://www.songmics.com/ |
sku | string | 商品唯一标识 | SOL-3782 |
spu | string? | 父商品标识(变体合并) | SOL-3782-G |
title | string | 商品名 | 电动升降桌 48 英寸 |
description | string? | 商品描述 | 电动升降办公桌, 静音电机... |
image_urls | string[] | 商品图片 URL 列表 | ["https://cdn..."] |
category_path | string? | 分类路径(/分隔) | Office Furniture/Desks |
sale_price | number? | 当前售价 | 189.99 |
original_price | number? | 原价(划线价) | 209.99 |
currency | string? | 货币 3 字母代码 | USD / EUR / GBP |
status | string? | 状态 | on_sale / out_of_stock / discontinued |
ratings | number? | 平均评分 | 4.6 |
review_count | integer? | 评论数 | 412 |
brand | string? | 品牌 | SONGMICS |
product_url | url | 商品 PDP URL(可点击) | https://www.songmics.com/products/sol-3782 |
crawled_at | iso8601? | 抓取时间戳 | 2026-05-24T10:30:00 |
confidence | number | 数据置信度 0-1 | 0.97 |
| 字段 | 类型 | 含义 |
|---|---|---|
site | string | 内部代号 · songmics_us / vidaxl_de / wayfair_us / ... |
crawl_url | url | 实际爬取的网站 URL(重要!告诉你这个数据从哪来) |
brand | string | 品牌 |
country | string | 2 字母国家代码 |
platform | string | shopify / vue_spa / nuxt / vidaxl / wayfair / bol / cdiscount / ikea / westelm / cratebarrel / overstock / idealo / otto / allegro / ebay / houzz / article / generic |
sku_count | integer | 当前已抓 SKU 数 |
coverage_pct | number | 覆盖率(已抓 / 满量) |
status | string | healthy / warning / critical / empty |
last_crawled | iso8601? | 最后抓取时间 |
proxy_tier | string | none / datacenter / residential |
anti_bot_level | integer | 反爬难度 1-5: 1 容易(Shopify)/ 2-3 中(Cloudflare)/ 4 难(PerimeterX)/ 5 最难(Akamai+DataDome) |
所有 site 的爬取目标 URL(可直接看 /api/v2/sources)。
| 类别 | 站点 | crawl_url 模式 | 反爬 |
|---|---|---|---|
| 家居品牌自营 | SONGMICS × 6 | songmics.com / songmics.de / .fr / .uk / .es / .it | L1 |
| Costway × 9 | costway.com/.ca/.co.uk/.de/.fr/.it/.es/.nl/.pl | L2 | |
| Homary × 5 | homary.com / uk.homary.com / de./es./fr. | L2 | |
| Vidaxl × 12 | vidaxl.com/.co.uk/.ca/.ie/.de/.it/.es/.fr/.ro/.pt/.nl/.pl | L4 | |
| Flexispot × 9 | flexispot.com/.co.uk/.ca/.de/.it/.es/.fr/.nl/.pl | L2 | |
| 家居 Marketplace | Wayfair / Overstock / WestElm / Crate&Barrel / Article / IKEA | wayfair.com / overstock.com / westelm.com / crateandbarrel.com / article.com / ikea.com/us/en/ | L2-L5 |
| 欧洲电商 | Otto / Bol / CDiscount / Idealo | otto.de / bol.com / cdiscount.com / idealo.de | L3-L4 |
| 大型市场 | eBay / Allegro | ebay.com / allegro.pl | L5 |
| 其他 | BCP / Yaheetech / VonHaus / Woltu / Houzz | bestchoiceproducts.com / yaheetech.shop / vonhaus.com / woltu.eu / houzz.com | L1-L3 |
import requests API = "https://smartcrawler.io/api/v2" KEY = "sck_ikuBVCAjAKygdAxu8_DNSDc9iOJkgXMY7jBf5ceMmlw" H = {"Authorization": f"Bearer {KEY}"} # Scrape r = requests.post(f"{API}/scrape", headers=H, json={"url": "https://www.songmics.com/products/sol-3782"}) data = r.json()["data"] print(f"{data['title']} · {data['sale_price']} {data['currency']}") # Map · 列出某站所有商品 URL r = requests.post(f"{API}/map", headers=H, json={"url": "https://www.songmics.com/", "limit": 100}) urls = r.json()["links"] # Crawl · 整站爬 r = requests.post(f"{API}/crawl", headers=H, json={"url": "https://www.songmics.com/", "limit": 1000}) job = r.json()["job_id"] # Poll · 等完成 import time while True: r = requests.get(f"{API}/crawl/{job}", headers=H) s = r.json() if s["status"] in ("success", "failed"): break time.sleep(5) print(f"Got {len(s['data'])} products")
from firecrawl import FirecrawlApp # 把 base_url 改成我们的 app = FirecrawlApp( api_key="sck_ikuBVCAjAKygdAxu8_DNSDc9iOJkgXMY7jBf5ceMmlw", api_url="https://smartcrawler.io/api/v2" ) r = app.scrape_url("https://www.songmics.com/products/sol-3782")
curl -X POST -H "Authorization: Bearer sck_iku..." \ -H "Content-Type: application/json" \ -d '{"url":"https://www.songmics.com/"}' \ https://smartcrawler.io/api/v2/scrape
| 状态 | 含义 |
|---|---|
| 200 | OK |
| 400 | 请求 body 缺字段 |
| 401 | 未登录 / API key 无效 |
| 404 | 资源不存在(site 不在 59 列表 / job 不存在) |
| 429 | 调用频率超限 |
| 500 | 服务端错误 |
v1 (/api/v1/*) 仍正常工作。v1 偏向「读数据」(GET 居多),v2 偏向「爬数据」(POST 触发)。
GET /api/v1/products?site=xxx · 列已抓商品GET /api/v1/reviews · 列评论GET /api/v1/promotions · 列促销POST /api/v2/scrape · 主动触发单 URL 抓取 + 返结构化POST /api/v2/crawl · 触发整站爬