Machine-Generated Code Detection: Whitespace and Embeddings

March 9, 2026 5 min read
Machine-Generated Code Detection: Whitespace and Embeddings

Code generation detection is nuanced because style transfer and formatting tools can hide or distort traditional signals. We explored both explicit structural features and embedding-based representations.

Evaluation Principles

  • Use mixed-language datasets to avoid overfitting stylistic artifacts.
  • Stress test with formatting normalization and minimal-edit transformations.
  • Compare model confidence calibration across classes, not only raw accuracy.

Detection quality is about resilience to transformation, not just benchmark wins.