# Build Your Personal Agent with Lark *2026-05-24 · [Yuyang Ding](https://yyding1.github.io/)* *First post in an ongoing series on **Project Milo**, the personal chat agent. See the [Vision](./vision.md) for the bigger picture.* Lark chat --- ## What It Can Do ### Use Lark in plain language Through `lark-cli`, the agent can operate the Lark surface area your account can access, including: - **Calendar**: "book 30 min with Miko next Tuesday afternoon" - **Docs and Wiki**: "summarize this doc and send it to me" - **Instant Messaging**: "what did Miko say about the launch last week?" - **Mail**: "draft a polite decline to this Friday invite" - **Meetings and Minutes**: "what did I commit to in yesterday's standup?" - **Bitable, Sheets, Tasks, Contacts, Drive, Approval, Whiteboard, OKR, and more**: plug-and-play skill packs ### Remember the person, not just the chat Two layers of persistence: - **Per-chat transcript**: a JSON message log per `chat_id`, preserving OpenAI-shaped assistant/tool linkage so multi-turn conversations continue cleanly. - **Long-term memory**: model-written files under the configured `memory_dir` (`/workspace/memory/` in the container for the `local_attach` deployment, or `~/.uni-agent/app/lark_chat/memory/` on the host for `local_native`) — `profile.md`, `preferences.md`, and topic notes. The transcript captures what happened recently in this chat. Memory captures what should remain true tomorrow: your name, timezone, team, projects, language preference, recurring constraints, and the people you often mention. If history is trimmed, the process restarts, or the container is recreated, memory still lives where it was written. The next Lark message picks up from there. ### Bring your own model The app talks to any OpenAI-compatible chat-completions endpoint. Self-hosted serving stacks like vLLM or SGLang, public APIs, or an internal model gateway all work the same way. The model is just config: ```yaml model: base_url: http://localhost:8000/v1 name: Qwen/Qwen3.6-35B-A3B api_key: EMPTY ``` Big GPU box? Run the bigger model. Laptop demo? Point it at a smaller endpoint. The Lark integration stays the same. --- ## Step 0: Prerequisites and dependencies - macOS or Linux - Docker (used by the recommended `local_attach` deployment; not strictly required if you choose `local_native`) - An OpenAI-compatible chat-completions endpoint - A Lark/Feishu app in the [Lark Open Platform](https://open.feishu.cn) Host-side dependencies are intentionally minimal. Clone the repo, create a virtualenv, install: ```bash git clone https://github.com/yyDing1/uni-agent.git cd uni-agent python3 -m venv .venv source .venv/bin/activate pip install swe-rex pydantic loguru orjson aiohttp openai pexpect pyyaml ``` ## Step 1: Create a Lark bot **Create a Lark bot and authorize it.** - `npm install -g @larksuite/cli` to install the CLI; `lark-cli --version` to verify. - `lark-cli config init --new`, creates a new app in the [Lark Open Platform](https://open.feishu.cn) (browser flow, captures `app_id` / `app_secret`). - `lark-cli auth login`, OAuth device flow that binds the app to your Feishu account and grants its scopes. The agent acts as this bot; its reach is exactly the scopes you authorized. **Deploy in a sandbox (Optional).** Running LLM-generated shell directly against your host is risky on a personal machine. For isolation, spin up a Docker container first and run the same setup as above inside it (plus a `swerex.server` so the host process can attach over HTTP): ```bash docker rm -f lark-chat-sandbox 2>/dev/null docker run -d --name lark-chat-sandbox -p 18000:18000 \ -v ~/.uni-agent/app/lark_chat/workspace:/workspace \ nikolaik/python-nodejs:python3.12-nodejs22-bookworm tail -f /dev/null docker exec -it lark-chat-sandbox bash -lc ' set -e npm install -g @larksuite/cli pip install swe-rex lark-cli config init --new lark-cli auth login lark-cli auth status' docker exec -d lark-chat-sandbox bash -lc ' python3 -m swerex.server --host 0.0.0.0 --port 18000 --auth-token CHANGEME' ``` `CHANGEME` can be any string; just match it in Step 2. ## Step 2: Configure Open `app/lark_chat/config.local_native.yaml`, the default config: ```yaml deployment: type: local_native startup_timeout: 60.0 memory_dir: ~/.uni-agent/app/lark_chat/memory model: base_url: http://localhost:8000/v1 name: Qwen/Qwen3.6-35B-A3B api_key: EMPTY sampling_params: temperature: 1.0 top_p: 0.95 presence_penalty: 1.5 top_k: 20 repetition_penalty: 1.0 tools: - execute_bash - lark-cli - str_replace_editor - finish skills_dir: ~/.agents/skills transcripts_dir: ~/.uni-agent/app/lark_chat/transcripts agent: action_timeout: 60 max_steps_per_turn: 20 history_max_tokens: 128000 # compaction trigger history_target_tokens: 32000 # post-compaction size ``` What each section does: - `deployment`: where the agent's bash session runs. `local_native` runs in-process via `pexpect`; `startup_timeout` caps boot. - `memory_dir`: long-term notes the agent writes (profile, preferences, topic memos). Survives restarts. - `model`: any OpenAI-compatible chat-completions endpoint. Two common setups: - **Self-hosted**: serve a model with vLLM or SGLang and point `base_url` at it (`api_key` can be any non-empty string). - **Hosted API**: e.g. Doubao Seed on Volc Ark, set `base_url: https://ark.cn-beijing.volces.com/api/v3`, `name: doubao-seed-1-6-250615` (or your model id), `api_key: `. - `sampling_params`: forwarded to the chat-completions call. - `tools`: built-in tool wrappers exposed to the model. - `skills_dir`: where `SkillsManager` finds skill packs (`lark-im`, `lark-base`, …). - `transcripts_dir`: per-chat JSON trajectory logs (model messages + tool calls + results). - `agent`: per-action timeout, max model steps per Lark turn, and a hysteresis-style history budget — `history_max_tokens` is the compaction trigger, `history_target_tokens` the size we compact back down to. While we stay under the trigger, the history is forwarded unchanged so the model server's prefix KV cache hits across turns. **For `local_attach`**, edit `config.local_attach.yaml` and override just the `deployment` block + `memory_dir`; everything else stays the same: ```yaml deployment: type: local_attach container: lark-chat-sandbox # match docker run --name swerex: host: http://127.0.0.1 port: 18000 auth_token: CHANGEME # match swerex.server --auth-token post_setup_cmd: cd /workspace memory_dir: /workspace/memory # container path; bind-mount /workspace to a host dir to persist memory across container restarts. ``` ## Step 3: Run ```bash python -m app.lark_chat.main ``` That picks up `config.local_native.yaml`. For `local_attach`, pass the config explicitly: ```bash LOCAL_ATTACH_AUTH_TOKEN=CHANGEME \ python -m app.lark_chat.main --config app/lark_chat/config.local_attach.yaml ``` (`LOCAL_ATTACH_AUTH_TOKEN` is only needed when `deployment.swerex.auth_token` is `null` in the YAML.) You're live when you see: ```text Entering chat loop. Send a Lark message to the bot. ```