Even AI Struggles to Understand Excel Sheets - Microsoft Steps in to Help

Even AI Struggles to Understand Excel Sheets - Microsoft Steps in to Help

If you have trouble looking through Excel spreadsheets and would rather have an AI chatbot understand the meaning of every row and column, Microsoft may hold the key to helping law masters better understand spreadsheets.

Not only you, but AI is known to have a hard time processing spreadsheets. Its vast grid and various cell formats are hurdles that LLMs must overcome.

Now a group of researchers at Microsoft think they may have found a solution that optimizes LLM's approach to deciphering spreadsheets.

In a preprint paper submitted on July 12, the researchers presented SpreadsheetLLM. This is a new method that combines encoding and compression with a leading AI chatbot to more efficiently process spreadsheets.

According to their data, the GPT4 AI model improved 27% in spreadsheet table detection and nearly 26% in in-context learning performance when using their method. Their method also led to cost savings of up to 96%, based on GPT4 and GPT3.5-turbo prices.

This version may be integrated into Microsoft Copilot for 365 in the future, making it easier than ever to make sense of data. [The key to the success of SpreadsheetLLM is Microsoft's SheetCompressor, an encoding framework that effectively compresses spreadsheets for LLM.

The spreadsheet is composed of three modules: one to make the spreadsheet more readable for the LLM, another to bypass empty cells and repeated numbers, and a third to make it easier for the LLM to understand the meaning of numbers (whether they are years, phone numbers, etc.).

This compression method reduced token usage in spreadsheet encoding by 96%. This compression method significantly improved performance on larger spreadsheets, where the challenge of high token usage is felt most.

In the paper, the authors state that they created a “Chain of Spreadsheet,” a framework extender that identifies the tables associated with a question and determines the boundaries of the associated content. The questions and data are re-presented to the LLM, which processes the clipped information to generate answers.

Direct input of typical spreadsheets often exceeded the token limits of traditional models. Chain-of-spreadsheets help LLMs focus only on areas relevant to the question, reducing unnecessary data and maintaining LLM efficiency.

One limitation noted by the Microsoft researchers for the current method is that it still cannot handle spreadsheet formatting details such as background colors and borders.

This is not immediately significant for the average user, but if newer versions of chatbots such as ChatGPT or Claude incorporate Microsoft's SpreadsheetLLM, we could quickly upload the entire spreadsheet and ask the chatbot questions in layman's terms, and we might be able to receive a summary or analysis of the data based on the uploaded file.

Categories