This tool is part of the Red Pin Geek Premium series.

For the full walkthrough — how to interpret your results, what to fix first, and real before/after data from my own store — read the companion post on Substack.

Hello, World!

Premium Tool

The AI Store Test Kit

5 prompts that get ChatGPT to reveal exactly how it evaluates your store — and why it's recommending your competitors instead of you.

How to use these prompts

Run all 5 prompts in order, in the same ChatGPT conversation. Each builds on the last. The whole sequence takes about 10 minutes. Have a notepad ready — you'll want to capture what ChatGPT tells you.

1 The Discovery Test

Purpose

Find out if AI recommends your store when someone searches for exactly what you sell.

Template — Copy & Customize

I'm looking for a [handmade/artisan/one-of-a-kind] [gemstone name] [product type] from an independent jewelry designer. Something [style descriptor], set in [metal type], under $[price].

Andrea's Version

"I'm looking for a handmade aquamarine necklace from an independent jewelry designer. Something one-of-a-kind, set in gold, under $500."

Customization Tips

Words to use Use the words your ideal buyer would type, not your marketing language
Price ceiling Include one — this tests whether AI can match your pricing in structured data
Metal type Be specific (e.g., "sterling silver" not just "silver") to test data precision
"Independent" This filters out mass-market and tests whether AI categorizes you correctly

What to write down: Which stores were recommended. Were you included? How many total recommendations? You'll need the store names for Prompt 3.

2 The Reasoning Request

Purpose

Get ChatGPT to reveal the exact criteria and language signals it used to choose stores.

Prompt — Copy & Paste Directly

How did you decide which stores to recommend? What criteria did you use, and in what order? Be specific about what signals you looked for.

What ChatGPT Told Andrea

It searched for these specific language signals:

handmade handcrafted raw stone natural variation artisan one-of-a-kind

It compromised on "gold" — accepting gold-filled and vermeil because solid options were limited. It did NOT verify actual production methods.

What to write down: The specific language signals it lists. Whether it compromised on any of your constraints. Whether it mentions verifying production methods. This data feeds directly into your fix list.

3 The Competitor Comparison

Purpose

Get a direct evaluation of your store against the ones ChatGPT chose over you.

Template — Customize with Your URL

I run [your store URL]. Can you evaluate my store against the same criteria you used for your recommendations? What would I need to change to be included in your answer?

⚠️ Watch out for generic advice

ChatGPT sometimes suggests things like "build a filtered page for Aquamarine Necklaces Under $500." If you're a OOAK designer, you can't build a page for inventory you don't have. Use the diagnosis. Question the prescription.

4 The Score Request

Purpose

Get a numerical score you can track monthly to measure improvement.

Prompt — Copy & Paste Directly

On a scale of 1-10, how well does my store match the original query? Score me and explain your reasoning for each factor.

How to Read Your Score

1 – 3 AI can barely read your store. Fix structured data basics first.
4 – 5 AI sees you but can't match you to queries. Fix product data specificity.
6 – 7 AI recognizes your quality but can't verify details. Fix the machine-legibility gap. (This is where Andrea scored.)
8 – 9 AI can match and recommend you. Focus on third-party signals.
10 You're the default answer. Maintain and monitor.
5 The Verification Test

Purpose

The killer prompt. Gets ChatGPT to admit it can't verify who's actually handmade — which reveals why linguistic legibility is the real game.

Prompt — Copy & Paste Directly

Be honest: when you evaluated stores for this query, did you actually verify whether they're genuinely handmade? Or did you rely on language signals? How confident are you in your recommendations?

What ChatGPT told Andrea:

"To be blunt, 'independent,' 'handmade,' and 'one-of-a-kind' get used loosely online. A better next pass would be to tighten the standards."

Why this matters: This answer is your entire strategy in one sentence. AI wants to verify authenticity but can't yet. The stores that make their real craft machine-legible now win when verification tightens. That's your moat.

Monthly Tracking Protocol

Run this same prompt sequence once a month. Same words, same platform. Copy this table into a Google Sheet or notebook and fill it in each time.

Date Platform Score Included? Key Feedback
Apr 2026 ChatGPT 6.5 / 10 No Inventory matchability — cite-worthy but couldn't verify product match
May 2026 ChatGPT ___ / 10 ___ ___
Jun 2026 ChatGPT ___ / 10 ___ ___

Bonus: Run the same query on Perplexity, Google AI Mode, and Bing Copilot. Different platforms weight different signals — you might score higher on one, which tells you where your strengths are.

FREQUENTLY ASKED QUESTIONS

Do I need ChatGPT Plus to use the AI Store Test Kit?
No. The prompts work with the free version of ChatGPT. You'll get slightly more detailed responses with GPT-4, but the scoring and insights work with any version. You can also run the prompts in Claude, Perplexity, or Google Gemini — the results will vary by platform, which is actually useful data.
How long does the full test take?
About 15 minutes for all 5 prompts. Each prompt takes 2-3 minutes to run and read the response. I recommend doing all 5 in one sitting so you get the full picture, but you can split them across sessions if needed.
How often should I re-run the test?
Monthly is the sweet spot. AI models update frequently, so your score can shift even without changes on your end. The tracking protocol in the kit helps you compare results over time so you can see what's actually moving the needle versus what's just model variance.
Does this work for stores on any platform?
Yes. The prompts test how AI sees your public-facing store — the data it can crawl and read. That's platform-agnostic. Whether you're on Shopify, Squarespace, Wix, WordPress, or a custom site, the test evaluates the same things: product data clarity, policy parsability, trust signals, and content depth.