Four months ago, a South Carolina lawyer watched Claude Code do something that unsettled him. He describes it as "pretty powerful and scared the shit out of me." His response was to spend the following months building serious AI infrastructure for his 11-year-old solo firm - while still running cases.
The result: a server running 10 NVIDIA V100 GPUs with 320GB of combined VRAM (video memory used to load and run AI models - more VRAM means larger models and faster outputs). He's running it headless on Linux (no monitor, operated entirely from the command line) with vLLM, an open-source framework for serving large language models efficiently. He posted asking for tips while preparing for trial next week.
The use case is practical: automating paralegal work. Document review, drafting routine correspondence, organizing case files - the cognitive tasks that eat attorney hours but don't require legal judgment. A server configuration like this is a substantial capital expense, but compared against paralegal billing rates over several years, the economics can work for a busy solo practice.
He's honest about the learning curve: "I've probably wasted more time than I gained" in getting the infrastructure running. That's a common and rarely admitted experience for professionals investing in local AI without an IT background. Linux administration, GPU driver management, vLLM configuration, and model selection each have their own steep learning curve - none of which is intuitive for someone whose expertise is trial law.
What he built reflects a pattern showing up across small professional practices. Lawyers, accountants, and consultants with repetitive document-heavy workflows are choosing to own their AI infrastructure outright rather than pay per-API-call for sensitive work. The on-premise approach keeps client data off third-party servers, which matters in regulated industries where client confidentiality is a professional obligation, not a preference.