← Back to PRs

#22884: feat(msteams): add resumable upload sessions with retry and range-aware resume

by chansuke open 2026-02-21 19:51 View on GitHub →
channel: msteams size: L
## Summary Describe the problem and fix in 2–5 bullets: - Problem: MS Teams Graph file uploads had one-shot behavior that was fragile for large files and could send incorrect continuation ranges after interrupted chunk scenarios. - Why it matters: Large uploads are common for Teams attachments, and transient failures/throttling can cause user-visible delivery failures. - What changed: Added shared upload flow for OneDrive/SharePoint, simple upload for <=4MB, resumable session uploads for larger files, bounded retry handling (429/503/504 + retry-after), and `nextExpectedRanges`-driven continuation on `202`. - What did NOT change (scope boundary): No changes to sharing-link permissions, auth scopes, channel routing, or non-MSTeams integrations. ## Change Type (select all) - [x] Bug fix - [x] Feature - [x] Refactor - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [ ] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [x] Integrations - [x] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes #N/A - Related #N/A ## User-visible / Behavior Changes - MS Teams file uploads larger than 4MB now use Graph resumable upload sessions instead of only simple upload. - Under transient upload interruptions, continuation now follows Graph `nextExpectedRanges` from `202` responses, reducing `Content-Range` mismatch failures. - Upload requests now retry boundedly on retryable status codes and transient fetch errors. ## Security Impact (required) - New permissions/capabilities? (`Yes/No`) No - Secrets/tokens handling changed? (`Yes/No`) No - New/changed network calls? (`Yes/No`) Yes - Command/tool execution surface changed? (`Yes/No`) No - Data access scope changed? (`Yes/No`) No - If any `Yes`, explain risk + mitigation: - Large-file uploads now call Graph `createUploadSession` and chunk `PUT` endpoints. Risk is additional network retries; mitigation is bounded retries with retry-after support and no new token scope requirements. ## Repro + Verification ### Environment - OS: Linux (local workspace) - Runtime/container: Node 22 + pnpm 10 - Model/provider: N/A - Integration/channel (if any): MSTeams (Graph upload path) - Relevant config (redacted): Test-only mocked `fetchFn` and token provider ### Steps 1. Run `pnpm vitest run extensions/msteams/src/graph-upload.test.ts`. 2. Confirm simple upload test passes for <=4MB payloads. 3. Confirm resumable upload test passes where server returns `nextExpectedRanges` that differs from local chunk end. ### Expected - Upload logic uses resumable sessions for >4MB and continues from Graph-provided `nextExpectedRanges`. ### Actual - Tests pass and verify `Content-Range` progression follows server continuation offset. ## Evidence Attach at least one: - [x] Failing test/log before + passing after - [ ] Trace/log snippets - [ ] Screenshot/recording - [ ] Perf numbers (if relevant) ## Human Verification (required) What you personally verified (not just CI), and how: - Verified scenarios: Ran `pnpm vitest run extensions/msteams/src/graph-upload.test.ts`; validated simple upload path and resumable continuation behavior. - Edge cases checked: Continuation offset taken from `nextExpectedRanges` instead of local `endExclusive`. - What you did **not** verify: Live upload against a real Microsoft tenant; production throttling behavior with real `retry-after` values. ## Compatibility / Migration - Backward compatible? (`Yes/No`) Yes - Config/env changes? (`Yes/No`) No - Migration needed? (`Yes/No`) No - If yes, exact upgrade steps: ## Failure Recovery (if this breaks) - How to disable/revert this change quickly: Revert commits introducing resumable-session flow and continuation parsing (`MSTeams: add resumable upload session for large files`, `MSTeams: honor nextExpectedRanges in resumable upload`). - Files/config to restore: `extensions/msteams/src/graph-upload.ts`, `extensions/msteams/src/graph-upload.test.ts`. - Known bad symptoms reviewers should watch for: `416`/range mismatch errors, repeated chunk retries without forward progress, large-file uploads failing while small files succeed. ## Risks and Mitigations List only real risks for this PR. Add/remove entries as needed. If none, write `None`. - Risk: Strict parsing of `nextExpectedRanges` may reject unexpected server payload shapes. - Mitigation: Fail fast with explicit error message and response body context for debugging. - Risk: Retry logic can increase completion latency during service degradation. - Mitigation: Retries are bounded and honor `retry-after` to reduce aggressive retry behavior. <!-- greptile_comment --> <h3>Greptile Summary</h3> Added resumable upload session support for MS Teams Graph file uploads larger than 4MB with retry handling and `nextExpectedRanges`-driven continuation. The implementation refactored shared upload logic for both OneDrive and SharePoint, with bounded retry on retryable status codes (429/503/504) and retry-after support. - Introduced `uploadWithResumableSession` for chunked uploads with 5MB chunks (Graph-compliant 320 KiB multiple) - Added `fetchWithRetry` with exponential backoff and retry-after header parsing for HTTP date and numeric formats - Refactored `uploadToOneDrive` and `uploadToSharePoint` to share common `uploadToDrive` logic with scope discrimination - Implemented `parseNextExpectedRangeStart` to extract continuation offset from Graph 202 responses - One critical issue: infinite loop vulnerability if server returns 202 with no forward progress (`nextStart == start`) <h3>Confidence Score: 3/5</h3> - safe to merge after addressing the infinite loop vulnerability in resumable upload - the implementation is well-structured with comprehensive retry handling and proper Graph API integration, but contains a critical infinite loop vulnerability when server returns 202 with nextStart equal to current start position (no forward progress). Test coverage validates the happy path but doesn't cover the stuck-progress edge case. Once the progress guard is added, this becomes a solid enhancement. - `extensions/msteams/src/graph-upload.ts` line 261-296 needs the infinite loop guard added <sub>Last reviewed commit: 8fa5f6f</sub> <!-- greptile_other_comments_section --> <sub>(4/5) You can add custom instructions or style guidelines for the agent [here](https://app.greptile.com/review/github)!</sub> <!-- /greptile_comment -->

Most Similar PRs