Standard RAG pipelines treat documents as flat strings of text. They use "fixed-size chunking" (cutting a document every 500 ...
Given a file (or some information about a file), return a set of standardized tags identifying what the file is. This is a Rust port of the Python identify library. What is the type: file, symlink, ...