AI vs. Human Art Classification

A deep-learning classifier that distinguishes AI-generated from human-made artwork, with interpretability via Grad-CAM, t-SNE, and PCA.

Tue Apr 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time)

A CNN classifier trained on the Kaggle Human AI Artwork Dataset (270k+ images across 47 style categories) that answers three progressively harder questions — is this image AI-generated; which generator made it; and what style is it in — while making its decisions inspectable.

Try it in your browser

Everything runs locally via ONNX Runtime Web — the image never leaves your device. The saliency overlay is produced by sliding a 24×24 gray occlusion patch across the image and measuring how much the predicted-class confidence drops when each region is blocked.

Results

TaskClassesTop-1Top-5F1
Binary (Human vs AI)293.71%0.94
Grouped (AI_SD / AI_LD / DiffusionDB / Human)492.85%99.42%0.93
Flat style classification4744.38%84.22%0.39 (weighted)

Tuned architecture: 3 conv layers (32 → 64 → 64), batch norm, L2, dropout 0.42, lr 0.005.

Confusion matrix for the tuned binary CNN — 14,391 Human correct, 36,586 AI correct, misclassifications symmetric

Interpretability

Three lenses on what the model learned:

  • Grad-CAM — spatial attention maps per image, useful for checking whether the model keys on brushstroke-level features vs. composition-level ones.
  • t-SNE / PCA on the penultimate layer — clusters separate AI generators cleanly; human styles overlap more.
  • Style prototypes — per-class nearest neighbors to the class centroid, which surface what each style “looks like” to the model.
t-SNE projection of the 47-style flat CNN's 128-dim latent space. AI sub-generators cluster tightly; human styles spread and overlap.

Takeaways

  • The binary AI/Human signal is strong and largely texture-driven.
  • Grouped classification of which AI engine generated an image is nearly as easy as binary detection — different diffusion pipelines leave different fingerprints.
  • 47-way style classification is genuinely hard (0.39 F1); the model falls back to broad-class cues when style cues conflict, which is visible in the confusion matrices.

Code: github.com/rohit-ravi2/visual-classification-ai-vs-human-art