Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate

Funding requirements

Sign grant agreement
Reach min funding
Get Manifund approval
1

Verified GenAI: Agentic QC for Reliable Product Visualization

Science & technologyTechnical AI safetyAI governance
Vahit avatar

Vahit FERYAD

ProposalGrant
Closes March 2nd, 2026
$0raised
$5,000minimum funding
$20,000funding goal

Offer to donate

33 daysleft to contribute

You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.

Sign in to donate

Project summary

Problem

E-commerce GenAI is good at making pretty images, but not reliable at preserving the exact product identity (shape, color, texture, logo) across generations and turning that into a scalable pipeline. The missing piece is automated verification: today, most teams still rely on manual QC, which kills scale and trust.

What I’m building

Verified GenAI: an agentic LLM+VLM quality-control layer that sits on top of image/video generation and automatically checks whether outputs are acceptable for production.

Core idea:

  • Generate candidate visuals (image and optional short video)

  • Run VLM/CLIP-based checks for product identity + artifact detection

  • Iterate or reject automatically (agentic loop)

  • Produce a final output with a QC report (scores + reasons)

Scope and approach

1) Generation (baseline)

  • High-consistency product visualization using ComfyUI workflows:

    • SDXL + IP-Adapter/Refiner + ControlNet (depth/canny/lineart)

    • Image-to-video (I2V) and compositing where needed

  • Output is constrained by a reference product image + optional mask.

2) Verification (the funded “researchy” layer)

  • VLM/embedding checks to ensure:

    • Identity consistency (product remains the same object)

    • Color fidelity (prevent drift)

    • Artifact detection (extra parts, broken geometry, wrong branding)

  • An agent controller (LLM) decides what to change next: prompt edits, strength, control settings, re-run, or stop.

3) Evaluation harness

  • A reproducible benchmark-style harness:

    • acceptance rate

    • failure taxonomy

    • quality vs cost (GPU time per accepted output)

    • regression tests for pipeline changes

Deliverables

  • Open repo (or shareable private repo if required) with:

    • agentic orchestration code (FastAPI + worker queue)

    • ComfyUI workflows + configs

    • verification module + scoring outputs

  • Demo service: API endpoint that takes a product image URL and returns:

    • best output + QC report + trace

  • Technical report: methods, metrics, ablations, and measured improvements (baseline vs verified loop)

Milestones (example)

  • Week 1–2: baseline pipeline + dataset of failure cases + eval metrics definition

  • Week 3–4: verification + agentic iteration loop + automated reporting

  • Week 5–6: hardening, regression tests, cost/latency tuning, public demo

Why this is fundable (costs are real and justified)

  • GPU compute for controlled experiments (acceptance-rate improvement requires many runs)

  • Evaluation runs + ablations (quality vs cost curves)

  • Hosting for public demo + logging/monitoring

  • Optional: small dataset curation and annotation for failure categories

About me (relevant credibility)

PhD AI/ML Research Scientist (AI since 2014), academic + industry, production-grade delivery.

  • Remote U.S. collaboration: SMT (North Carolina) – sports video analytics for PFL (demo):
    https://drive.google.com/file/d/14RYbf63byBfrIr_9N-F0B9MjdaWrGA3k/view?usp=sharing

  • Publications:
    Edge devices object detection: https://www.researchgate.net/publication/376783175_Efficient_Object_Detection_Model_for_Edge_Devices
    Transformer/BERT NILM paper: https://www.mdpi.com/1996-1073/14/15/4649

  • Public demos (test links):
    Jewelry try-on / product visualization: https://renderfy-ai-lightbox.hf.space
    Fashion try-on (image-to-video + compositing): https://renderfy-fitsuite-ai.hf.space

  • LinkedIn: https://www.linkedin.com/in/vahit-feryad-19517256/

Expected impact

A practical, measurable step toward trustworthy, scalable GenAI for product visuals: fewer manual reviews, fewer bad outputs shipped, and a reusable QC framework that generalizes beyond Try-On.

What are this project's goals? How will you achieve them?

Goals

1) Make GenAI product visuals trustworthy

  • Goal: outputs preserve the same product identity (shape, texture, logo) and don’t drift in color or introduce artifacts.

  • Success metric: higher auto-acceptance rate at a fixed quality bar (vs. baseline generation without verification).

2) Reduce manual QC and make the workflow scalable

  • Goal: replace human review loops with an agentic verification loop that retries, fixes, or rejects automatically.

  • Success metric: fewer manual reviews per accepted asset; predictable cost per accepted output.

3) Produce a reproducible evaluation harness

  • Goal: a benchmark-like harness to measure quality, failure modes, and regressions across model/prompt/workflow changes.

  • Success metric: clear metrics dashboard + regression tests + ablation results.


How I’ll achieve them

A) Baseline generation pipeline (ComfyUI + SDXL stack)

  • Build strong reference-guided generation:

    • SDXL + IP-Adapter/Refiner

    • ControlNet (depth/canny/lineart) to constrain geometry

    • optional segmentation masks for clean compositing

  • Expose via GPU-backed FastAPI so it’s testable and reproducible.

B) Verification module (VLM/CLIP-style scoring + rule checks)

Automated checks run on each candidate output:

  • Identity consistency: embedding similarity between reference product and generated product crop

  • Color fidelity: color-difference checks on the product region (prevent drift)

  • Artifact detection: detect extra parts/warping/logo corruption via VLM judgments + heuristics

  • Output: a QC report with scores + fail reasons.

C) Agentic loop (LLM controller to fix failures)

An LLM-based controller reads the QC report and chooses the next action:

  • adjust prompt/negative prompt

  • change img2img strength / denoise

  • tweak ControlNet conditioning / weights

  • rerun with different seed

  • stop and reject if quality can’t be achieved within budget

D) Evaluation harness and reporting

  • Curate a test set of products + scenarios (including hard cases).

  • Track:

    • acceptance rate

    • failure taxonomy

    • GPU time per accepted output

    • quality vs cost trade-off

  • Run ablations to prove what improves results (verification alone vs agentic loop vs parameter changes).


Deliverables at the end

  • Working API: input product image → outputs best asset + QC report + trace

  • Open/shareable repo with workflows + verification + evaluation harness

  • A short technical report with metrics and comparisons (baseline vs Verified GenAI)

How will this funding be used?


How the funding will be used

1) GPU compute for controlled experiments and iterations

  • Running many generations per product is required to measure and improve:

    • acceptance rate

    • quality vs cost curves

    • ablations (with/without verification, different workflows, settings)

  • This is the main cost driver because the agentic loop intentionally does multiple retries until a strict QC threshold is met.

2) Hosting and infrastructure for a public demo API

  • A GPU-backed service (FastAPI + worker queue) with:

    • logging/tracing of agent decisions

    • storage for inputs/outputs and QC reports

    • basic monitoring (uptime, latency, error rates)

3) Evaluation dataset preparation (lightweight but necessary)

  • Building a small, representative benchmark set:

    • product reference images

    • scenario prompts/backgrounds

    • failure-case collection and labeling (artifact categories, identity drift, color drift)

  • This can be done mostly by me, with optional small paid annotation support if needed.

4) Engineering hardening and reproducibility

  • CI/regression tests to prevent quality regressions when changing:

    • ComfyUI workflows

    • model versions

    • verification thresholds

  • Packaging and documentation so others can reproduce results.

Budget sanity

I’m intentionally focusing spend on compute + minimal infra + evaluation, avoiding unnecessary overhead. The goal is a measurable, reproducible system rather than a flashy demo.

Who is on your team? What's your track record on similar projects?


Team

  • Vahit FERYAD (PhD) – AI/ML Research Scientist (AI since 2014), based in Istanbul.
    I will lead the project end-to-end: modeling choices, agent design, evaluation, and production deployment (API + GPU infra).

  • Optional support (only if budget allows): part-time annotation / QA help for labeling failure categories on a small evaluation set. This is not required to start and can be added later if it improves evaluation speed.


Track record on similar projects

1) Production-grade GenAI pipelines (image/video)

  • Built and deployed GPU-backed GenAI systems using ComfyUI + SDXL stacks and served behind FastAPI (async, batching, health checks).

  • Focus areas: high-consistency visual generation, workflow hardening, and automated verification components (CLIP/BLIP-style checks).

Public demos:

  • AI LightBox · Jewelry Virtual Try-On (high-consistency product visualization):
    https://renderfy-ai-lightbox.hf.space

  • FitSuite AI · Fashion Virtual Try-On (image-to-video + compositing):
    https://renderfy-fitsuite-ai.hf.space

2) Remote U.S. industry collaboration (computer vision in production)

  • Worked remotely with SMT (North Carolina) on Professional Fighters League (PFL) multi-camera video analytics, including real-time CV modeling for punch/kick speed analysis.
    Demo video: https://drive.google.com/file/d/14RYbf63byBfrIr_9N-F0B9MjdaWrGA3k/view?usp=sharing

3) Peer-reviewed publications showing research depth

  • Efficient object detection for edge devices:
    https://www.researchgate.net/publication/376783175_Efficient_Object_Detection_Model_for_Edge_Devices

  • Transformer-based NILM model using BERT (MDPI Energies):
    https://www.mdpi.com/1996-1073/14/15/4649

Why this matters for this grant

This project is not “just prompts.” It needs:

  • evaluation design + rigorous metrics

  • agentic iteration logic

  • production-grade deployment discipline
    That combination is exactly where I’ve repeatedly delivered.

What are the most likely causes and outcomes if this project fails?

Most likely causes of failure

1) Verification isn’t reliable enough

  • Cause: VLM/CLIP-style similarity can miss subtle identity drift (small logo changes, minor geometry shifts) or over-reject valid outputs.

  • Outcome: low acceptance rate, too many false positives/negatives, weak improvement over baseline.

2) Cost per accepted output is too high

  • Cause: the agentic loop may need multiple retries to pass strict QC, driving GPU spend up.

  • Outcome: the system works technically but is not economically viable for production.

3) Domain generalization is worse than expected

  • Cause: methods tuned for product visuals / try-on may not generalize to other categories or lighting/background conditions.

  • Outcome: results look good on a narrow demo set but don’t scale across varied products.

4) Data/evaluation set is not representative

  • Cause: benchmark set is too small or biased; failure taxonomy incomplete.

  • Outcome: “improvements” don’t hold in real use, regression risk remains.

5) Tooling and integration complexity

  • Cause: ComfyUI workflows + model versions + infra can be brittle; changes can silently degrade output quality.

  • Outcome: hard-to-reproduce results; maintenance burden increases.


If it fails, what do we still get? (salvageable outcomes)

Even in a “failure” scenario, we still produce useful assets:

  • A reproducible evaluation harness for product-accuracy in GenAI (baseline + metrics + failure taxonomy).

  • A set of verified baselines showing what does and doesn’t work (ablation results).

  • A production-ready API wrapper + logging/tracing around generation workflows.

  • Clear evidence on whether current VLM/CLIP methods are sufficient for strict product identity verification, and what gaps remain.

So the worst case is not “nothing works”; the worst case is “verification doesn’t meet a strict bar,” but we still generate a solid, publishable engineering/research package and a benchmark others can build on.

How much money have you raised in the last 12 months, and from where?

I have not raised any funding in the last 12 months (no grants, investors, or institutional funding).

CommentsOffersSimilar5

There are no bids on this project.