Add optional PDF and Office document readers #36

Open
opened 2026-04-06 22:28:51 +00:00 by archeious · 0 comments
Owner

The agent currently sees only filename and size for binary document formats. Add optional content extraction as lazy deps (gated behind --install-extras):

  • pdfminer or pypdf for PDF text extraction
  • openpyxl for Excel schema and sheet enumeration
  • python-docx for Word document text

Particularly valuable for the documents and data domains.

The agent currently sees only filename and size for binary document formats. Add optional content extraction as lazy deps (gated behind `--install-extras`): - `pdfminer` or `pypdf` for PDF text extraction - `openpyxl` for Excel schema and sheet enumeration - `python-docx` for Word document text Particularly valuable for the documents and data domains.
archeious added this to the Agentic Investigation Engine project 2026-04-06 22:33:59 +00:00
Sign in to join this conversation.
No labels
No milestone
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: archeious/luminos#36
No description provided.