You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
E-commerce GenAI is good at making pretty images, but not reliable at preserving the exact product identity (shape, color, texture, logo) across generations and turning that into a scalable pipeline. The missing piece is automated verification: today, most teams still rely on manual QC, which kills scale and trust.
Verified GenAI: an agentic LLM+VLM quality-control layer that sits on top of image/video generation and automatically checks whether outputs are acceptable for production.
Core idea:
Generate candidate visuals (image and optional short video)
Run VLM/CLIP-based checks for product identity + artifact detection
Iterate or reject automatically (agentic loop)
Produce a final output with a QC report (scores + reasons)
1) Generation (baseline)
High-consistency product visualization using ComfyUI workflows:
SDXL + IP-Adapter/Refiner + ControlNet (depth/canny/lineart)
Image-to-video (I2V) and compositing where needed
Output is constrained by a reference product image + optional mask.
2) Verification (the funded “researchy” layer)
VLM/embedding checks to ensure:
Identity consistency (product remains the same object)
Color fidelity (prevent drift)
Artifact detection (extra parts, broken geometry, wrong branding)
An agent controller (LLM) decides what to change next: prompt edits, strength, control settings, re-run, or stop.
3) Evaluation harness
A reproducible benchmark-style harness:
acceptance rate
failure taxonomy
quality vs cost (GPU time per accepted output)
regression tests for pipeline changes
Open repo (or shareable private repo if required) with:
agentic orchestration code (FastAPI + worker queue)
ComfyUI workflows + configs
verification module + scoring outputs
Demo service: API endpoint that takes a product image URL and returns:
best output + QC report + trace
Technical report: methods, metrics, ablations, and measured improvements (baseline vs verified loop)
Week 1–2: baseline pipeline + dataset of failure cases + eval metrics definition
Week 3–4: verification + agentic iteration loop + automated reporting
Week 5–6: hardening, regression tests, cost/latency tuning, public demo
GPU compute for controlled experiments (acceptance-rate improvement requires many runs)
Evaluation runs + ablations (quality vs cost curves)
Hosting for public demo + logging/monitoring
Optional: small dataset curation and annotation for failure categories
PhD AI/ML Research Scientist (AI since 2014), academic + industry, production-grade delivery.
Remote U.S. collaboration: SMT (North Carolina) – sports video analytics for PFL (demo):
https://drive.google.com/file/d/14RYbf63byBfrIr_9N-F0B9MjdaWrGA3k/view?usp=sharing
Publications:
Edge devices object detection: https://www.researchgate.net/publication/376783175_Efficient_Object_Detection_Model_for_Edge_Devices
Transformer/BERT NILM paper: https://www.mdpi.com/1996-1073/14/15/4649
Public demos (test links):
Jewelry try-on / product visualization: https://renderfy-ai-lightbox.hf.space
Fashion try-on (image-to-video + compositing): https://renderfy-fitsuite-ai.hf.space
LinkedIn: https://www.linkedin.com/in/vahit-feryad-19517256/
A practical, measurable step toward trustworthy, scalable GenAI for product visuals: fewer manual reviews, fewer bad outputs shipped, and a reusable QC framework that generalizes beyond Try-On.
Goal: outputs preserve the same product identity (shape, texture, logo) and don’t drift in color or introduce artifacts.
Success metric: higher auto-acceptance rate at a fixed quality bar (vs. baseline generation without verification).
Goal: replace human review loops with an agentic verification loop that retries, fixes, or rejects automatically.
Success metric: fewer manual reviews per accepted asset; predictable cost per accepted output.
Goal: a benchmark-like harness to measure quality, failure modes, and regressions across model/prompt/workflow changes.
Success metric: clear metrics dashboard + regression tests + ablation results.
Build strong reference-guided generation:
SDXL + IP-Adapter/Refiner
ControlNet (depth/canny/lineart) to constrain geometry
optional segmentation masks for clean compositing
Expose via GPU-backed FastAPI so it’s testable and reproducible.
Automated checks run on each candidate output:
Identity consistency: embedding similarity between reference product and generated product crop
Color fidelity: color-difference checks on the product region (prevent drift)
Artifact detection: detect extra parts/warping/logo corruption via VLM judgments + heuristics
Output: a QC report with scores + fail reasons.
An LLM-based controller reads the QC report and chooses the next action:
adjust prompt/negative prompt
change img2img strength / denoise
tweak ControlNet conditioning / weights
rerun with different seed
stop and reject if quality can’t be achieved within budget
Curate a test set of products + scenarios (including hard cases).
Track:
acceptance rate
failure taxonomy
GPU time per accepted output
quality vs cost trade-off
Run ablations to prove what improves results (verification alone vs agentic loop vs parameter changes).
Working API: input product image → outputs best asset + QC report + trace
Open/shareable repo with workflows + verification + evaluation harness
A short technical report with metrics and comparisons (baseline vs Verified GenAI)
How the funding will be used
Running many generations per product is required to measure and improve:
acceptance rate
quality vs cost curves
ablations (with/without verification, different workflows, settings)
This is the main cost driver because the agentic loop intentionally does multiple retries until a strict QC threshold is met.
A GPU-backed service (FastAPI + worker queue) with:
logging/tracing of agent decisions
storage for inputs/outputs and QC reports
basic monitoring (uptime, latency, error rates)
Building a small, representative benchmark set:
product reference images
scenario prompts/backgrounds
failure-case collection and labeling (artifact categories, identity drift, color drift)
This can be done mostly by me, with optional small paid annotation support if needed.
CI/regression tests to prevent quality regressions when changing:
ComfyUI workflows
model versions
verification thresholds
Packaging and documentation so others can reproduce results.
I’m intentionally focusing spend on compute + minimal infra + evaluation, avoiding unnecessary overhead. The goal is a measurable, reproducible system rather than a flashy demo.
Team
Vahit FERYAD (PhD) – AI/ML Research Scientist (AI since 2014), based in Istanbul.
I will lead the project end-to-end: modeling choices, agent design, evaluation, and production deployment (API + GPU infra).
Optional support (only if budget allows): part-time annotation / QA help for labeling failure categories on a small evaluation set. This is not required to start and can be added later if it improves evaluation speed.
Built and deployed GPU-backed GenAI systems using ComfyUI + SDXL stacks and served behind FastAPI (async, batching, health checks).
Focus areas: high-consistency visual generation, workflow hardening, and automated verification components (CLIP/BLIP-style checks).
Public demos:
AI LightBox · Jewelry Virtual Try-On (high-consistency product visualization):
https://renderfy-ai-lightbox.hf.space
FitSuite AI · Fashion Virtual Try-On (image-to-video + compositing):
https://renderfy-fitsuite-ai.hf.space
Worked remotely with SMT (North Carolina) on Professional Fighters League (PFL) multi-camera video analytics, including real-time CV modeling for punch/kick speed analysis.
Demo video: https://drive.google.com/file/d/14RYbf63byBfrIr_9N-F0B9MjdaWrGA3k/view?usp=sharing
Efficient object detection for edge devices:
https://www.researchgate.net/publication/376783175_Efficient_Object_Detection_Model_for_Edge_Devices
Transformer-based NILM model using BERT (MDPI Energies):
https://www.mdpi.com/1996-1073/14/15/4649
This project is not “just prompts.” It needs:
evaluation design + rigorous metrics
agentic iteration logic
production-grade deployment discipline
That combination is exactly where I’ve repeatedly delivered.
Cause: VLM/CLIP-style similarity can miss subtle identity drift (small logo changes, minor geometry shifts) or over-reject valid outputs.
Outcome: low acceptance rate, too many false positives/negatives, weak improvement over baseline.
Cause: the agentic loop may need multiple retries to pass strict QC, driving GPU spend up.
Outcome: the system works technically but is not economically viable for production.
Cause: methods tuned for product visuals / try-on may not generalize to other categories or lighting/background conditions.
Outcome: results look good on a narrow demo set but don’t scale across varied products.
Cause: benchmark set is too small or biased; failure taxonomy incomplete.
Outcome: “improvements” don’t hold in real use, regression risk remains.
Cause: ComfyUI workflows + model versions + infra can be brittle; changes can silently degrade output quality.
Outcome: hard-to-reproduce results; maintenance burden increases.
Even in a “failure” scenario, we still produce useful assets:
A reproducible evaluation harness for product-accuracy in GenAI (baseline + metrics + failure taxonomy).
A set of verified baselines showing what does and doesn’t work (ablation results).
A production-ready API wrapper + logging/tracing around generation workflows.
Clear evidence on whether current VLM/CLIP methods are sufficient for strict product identity verification, and what gaps remain.
So the worst case is not “nothing works”; the worst case is “verification doesn’t meet a strict bar,” but we still generate a solid, publishable engineering/research package and a benchmark others can build on.
I have not raised any funding in the last 12 months (no grants, investors, or institutional funding).
There are no bids on this project.