Job Title: AI Engineer
Location: Remote
Responsibilities:
1. Deploy, optimize, and maintain on-premise large models (e.g., Llama, DeepSeek), ensuring efficient and stable operation within internal networks or local environments.
2. Research and implement MCP (Model Context Protocol) features; develop and maintain MCP plugins to enable interaction with and extension of large models.
3. Optimize inference performance through model quantization, parallel computing, and inference acceleration techniques.
4. Build and maintain large model runtime environments, including GPU clusters, containerized deployments, and VDI/internal network adaptations.
5. Support business teams in integrating large models by providing API/SDK interfaces and ensuring service security and high availability.
6. Monitor cutting-edge developments in large models and AI infrastructure, proposing optimization solutions aligned with business requirements.
Qualifications:
1. Bachelor’s degree or higher in Computer Science, Artificial Intelligence, or a related field.
2. Proficient with common large model frameworks (PyTorch, TensorFlow, vLLM, LMDeploy, llama.cpp, etc.) with hands-on deployment experience.
3. Experience in on-premise inference and service deployment using Docker, Kubernetes, VMware, or local clusters.
4. Experience with MCP or other large model extension protocols/plugins is a plus.
5. Familiarity with GPU compute optimization, including model quantization, parallel inference, and distributed training/inference.
6. Strong Python skills; proficiency in Go or Node.js (or another backend language) is a plus.
7. Excellent analytical and problem-solving abilities, capable of independently setting up and optimizing large model environments.
Preferred Qualifications:
1. Experience with AI Agents, LangChain, AutoGPT, or similar projects.
2. Knowledge of private deployment and security/compliance requirements for on-premise AI solutions.
3. Contributions to open-source projects or active participation in technical communities.