Drive step function increases in data literacy and self-service
In recent years, many organizations have invested the time and effort to build a data warehouse and more robust reporting functionality. Despite this investment, growth stage companies still often find a gap between the data and the leaders across the organization who need to use it. It’s no wonder. Data is complex; many people in critical, decision-making roles don’t have the technical background required to decipher it; and data scientists who can bridge the gap are in short supply. While asking good questions and making good decisions is likely to require human intervention for some time, LLMs are already quite adept in other areas of the analytical process. For example, these models have an impressive ability to translate business questions into SQL or Python code in order to retrieve data, perform calculations, create charts and graphs, and summarize results in a narrative or bullet point. We expect that chat interfaces will also help educate workers about data concepts in-context and on-demand. (LLMs will happily explain the difference between mean and median as often as needed!) That said, while the potential for improvements in data literacy and self-service are significant, we recommend that leaders tread carefully in these areas. Models cannot easily distinguish between fact and fictional data; LLMs tend to “hallucinate”, confidently making up answers, and have problems calculating accurately. Use with caution in the near term.
Accelerated timeline from question to insight
Many executives can relate to this experience: ask a seemingly simple question about your business, only to find that it may take your team days, weeks, or more to get the answer. Data analysis – including locating data sources, stitching them together, ensuring data quality and (finally) performing the analysis – remains, in most organizations, a high-effort, time-consuming task. Technology has improved the process tremendously in recent years, from cloud data warehouses and data pipelines to drag-and-drop business intelligence software. And yet, there is still plenty of space for generative AI to further improve time to insight. We believe this will happen in two primary ways:
- Code creation: AI – including tools like Copilot and those with automatic code generation capabilities – are easing and accelerating the process of writing the code needed to move, clean, and analyze data.
- Access to larger data sets: LLMs can be tuned on unstructured data sets, allowing business users to query qualitative customer feedback, conversation histories from corporate instant messaging apps, and other data that may currently live outside “traditional” databases.
The result: a wider range of data will become available for analysis. With timely access to broader types of data, we expect business leaders will ask more questions, keeping data engineers busier than ever.
Heighted focus on data ownership, data quality and data privacy
The unprecedented performance of ChatGPT and other LLMs – on everything from writing poetry to passing exams – has captured our attention. At the same time, model accuracy and opacity surrounding sources of training data have raised meaningful ethical and legal concerns. To help improve accuracy and impact of these models, business leaders need to consider ways to fine-tune (i.e., provide additional training steps) or prompt (i.e., phrase a question in a particular way) models using their own internal data. For example, customer service teams can leverage existing documentation and past support solutions to fine-tune a support chat bot, providing customer service responses that are more specific than a general model. The quality of the data matters; answers can only be as good as the material that goes into the model.
At the same time, we encourage teams to tread very cautiously when deciding which data is used to train a model. Consider who has access to the data and what types of data are shared. Can anyone outside your organization access this information? Are you sharing any trade secrets, personally identifiable information (PII) or other confidential data? This type of information should never be entered into a consumer-level chat bot such as ChatGPT, and leaders need to proactively educate employees about this risk. We expect major cloud players, proprietary solution providers and open-source models alike to increase and improve offerings designed to help companies keep private data private while still taking advantage of the power of generative AI. We also expect regulation, including HIPAA, GDPR and other regulations, to play a role in risk management. For the moment, gen AI technology continues to evolve so rapidly that it has outpaced the introduction of new regulatory guidelines, and we expect that will continue for some time. In the meantime, we encourage executives to think proactively about data ownership and data quality, as well as the privacy protection practices that may affect their industry.