My work at Spider Lab focused on empirical software engineering and developer tooling research. I engineered telemetry instrumentation for a VS Code extension in order to capture runtime behavior, developer interactions, and failure cases. Designing this instrumentation required careful consideration of performance overhead, data granularity, and the types of signals that meaningfully reflect developer experience and software quality.
I also designed and ran controlled experiments comparing LLM generated tests against baseline test templates. I evaluated fault detection capability, line and branch coverage, and debugging efficiency. Through this work, I learned that higher coverage does not necessarily imply better tests and that assertion correctness is often the limiting factor in automated test generation. These findings challenged my initial assumptions and motivated deeper investigation into evaluation metrics.
This research experience taught me how to design experiments, reason about noisy data, and communicate tradeoffs clearly. It shaped how I think about AI assisted developer tools by emphasizing careful evaluation and calibration rather than raw automation. I gained a strong appreciation for rigorous methodology in software engineering research.