Machine-Generated Code Detection: Whitespace and Embeddings
March 9, 2026 5 min read
Code generation detection is nuanced because style transfer and formatting tools can hide or distort traditional signals. We explored both explicit structural features and embedding-based representations.
Evaluation Principles
- Use mixed-language datasets to avoid overfitting stylistic artifacts.
- Stress test with formatting normalization and minimal-edit transformations.
- Compare model confidence calibration across classes, not only raw accuracy.
Detection quality is about resilience to transformation, not just benchmark wins.