Due to the rapid evolution of artificial intelligence (AI) developments, this FAQ is subject to constant modifications. Last update: 11/10/2024.
A Large Language Model (LLM) is a machine learning model trained on a vast amount of textual data and capable of performing a number of tasks such as translation, text generation, document summarisation etc. Generative Pre-trained Transformers (GPT) are a sub-category of LLM that use a specific architecture to generate creative texts. They are trained on a large corpus of text before being refined for particular tasks, for example as chatbots.
These tools have made remarkable inroads, with ChatGPT, software developed by OpenAI, now available to the general public, and are capable of generating highly convincing texts on a multitude of subjects by communicating using natural language. Many similar tools have emerged since ChatGPT's arrival in November 2022.
They also make it possible to manage a variety of content, with tools like MidJourney for images and Sora for video.
With the advent of multi-modal models such as GPT-4o or Gemini 1.5, the capabilities of these tools have been considerably expanded. For example, as well as understanding and producing text, these models can now recognise visual elements in images and interact in real time with users. This includes the ability to respond vocally almost simultaneously, to analyse photos captured by the phone's camera and to process live video streams. This versatility paves the way for innovative applications in a variety of fields.
All the major commercial players (Microsoft, Google, Facebook, etc.) offer or are about to offer this type of tool. Although their implementation costs seemed to restrict their development to commercial players with significant financial resources, research has advanced extremely rapidly on this front and numerous open source alternatives have since appeared.
The main reason for this is that these tools are not yet available on the market.
Since 2023, many LLM-type solutions have been deployed by the main players in the Tech world. The main conversational agents currently available include:
Agrators like Poe allow you to combine the above LLMs via a single subscription: https://poe.com
Some platforms, such as DuckDuckGo, also offer the option of integrating some of these models without creating an additional account, with some active protection of confidentiality: https://duckduckgo.com/aichat
Hugging Face is an open source platform offering numerous artificial intelligence modèles. It provides free access to a wide variety of models, often without creating an account, for a wide range of tasks beyond simple conversational agents: https://huggingface.co
Alternatively, Perplexity is a search engine that uses AI to provide concise answers to questions, citing sources. It makes it easy to access reliable, well-researched information, leveraging advanced language model APIs such as GPT: https://www.perplexity.ai/
It is very difficult to make predictions about the future of these technologies, as the landscape is changing so rapidly. Numerous semi-autonomous agents based on these Large Language Models (LLMs) are emerging, making it possible to automate and control complex tasks by letting the machine automatically determine and plan a succession of tasks required to achieve a set objective.
In addition, simplifying the training of these models means that they can be specialised to improve their performance in specific tasks, based on pre-trained open source models. This removes the barrier to entry posed by training costs and enables competition that no longer restricts development to the major commercial players.
Whatever the future breakthroughs in this field, however, it is certain that these technologies derived from artificial intelligence will have a major impact in the future, similar to that which similar disruptive technologies have had in the past.
Yes, provided that the principles of data protection and confidentiality are respected in all situations: users must avoid using this type of tool to disseminate personal, sensitive data or data subject to official secrecy. This FAQ sets out the rules that employees, students, teachers and researchers must follow in their work.
Many members of UNIL actively use ChatGPT or other similar tools in their daily work. These tools make it possible to automate a large number of tasks, to facilitate access to information, and to perform tasks such as drafting texts, translating, creating summaries or producing simple computer code.
These same functionalities are also being introduced gradually and in a more or less transparent and obvious way into the software used on a daily basis, whether commercial or not. As a result, operating systems, reaction and analysis tools, wizards, planning tools, etc., are increasingly fed with these technologies, in order to improve the user experience.
It is reasonable to assume that, over time, this trend will continue to strengthen and that the direct or indirect use of these tools will only increase.
Community members wishing to use AI tools, including the in-house development of AI applications, may do so, as long as they comply with the basic principles set out in this FAQ, by using the tools available free of charge or by taking out individual subscriptions if necessary. The IT Centre actively monitors the use of these tools throughout the institution and may be asked to implement broader solutions at the explicit request of an entity (faculty, department, etc.).
Since May 2024, the IT Centre has implemented the Microsoft Copilot solution, which is currently included in our subscription.
You can go to this site https://copilot.microsoft.com/ and log in with your UNIL credentials. You will then have access to the Copilot solution and its chatbot, which works via Microsoft Azure APIs integrating models GPTs, hosted on European servers that comply with regional security and data protection standards.
WARNING! The potential uses of this solution in teaching and research are defined by the frameworks put in place by faculties and colleges. UNIL invites you to adhere strictly to these guidelines.
Despite the mention of the badge « Protected », no online AI tool guarantees a 100% confidentiality of the data entered into it. As a result, you must not enter into this conversational agent any information that is sensitive, personal or linked to official secrecy (link to the DPO's note on the level of data protection).
Link to access the service and instructions from the IT Centre: AI Virtual Assistant
In addition to its applications in research and teaching, artificial intelligence offers a variety of powerful tools that can simplify many everyday tasks. Here are a few examples of academic use:
Besides these more recent and diversified uses, artificial intelligence continues to play a central role in the management of traditional resources and processes. Here are a few concrete examples:
AI tools, black boxes?
Yes, ChatGPT, Microsoft Copilot, Google Gemini and other self-service tools are at this day « black boxes ». While it is possible to guess what some of the tools used to drive them are, their operation and architecture are not public. It is therefore impossible to audit their operation, which forces users to keep a critical eye on their use. As most of these tools do not provide access to the sources leading to the answers they generate, users are forced to check the reliability of the answers they generate.
Data protection and confidentiality
The use of AI tools requires great vigilance in terms of data protection and confidentiality. In the context of UNIL, users must avoid using this type of tool to disseminate data that is personal, sensitive or subject to official secrecy. The aim here is to apply the same judgement and critical eye as when using internet tools, and to bear in mind that any information made available on a website may be collected with an ultimate aim that is very far removed from what people may think of it. Particular vigilance is required when using these tools, whose ease of interaction tends to make users forget that they are faced with the same problems encountered when using a search engine such as Google or a translation tool such as DeepL (see Data protection and confidentiality section).
Bias, hallucinations and misinformation
These models, formed on vast and diverse Internet data, reproduce societal biases present in this data. In addition, they can generate hallucinations and provide information that seems perfectly plausible but is in fact inaccurate or does not exist in the training data. Sometimes they may also provide incorrect information due to limitations in their understanding or the context in which they operate. It is crucial for users·trice·s to understand these limitations.
The use of artificial intelligence tools has a significant environmental impact due to their energy consumption. For example, interactions with AI models for question-answer queries consume 10 to 15 times more energy than a standard Google search. More complex tasks, such as voice recognition or image management, require more resources, increasing their ecological footprint.
It is important to consider not only the energy consumed when using these tools, but also the energy expended during their development. The training process for advanced AI models such as GPT, Gemini and Meta, including open source models, is particularly energy-intensive. The training phase alone can consume as much energy as several thousand homes in a year. This consumption stems from the processing of vast sets of data needed to optimise the model's performance.
A study published in the journal Nature in 2024 provides detailed estimates of the energy consumption of various open source models: https://www.nature.com/articles/d41586-024-02680-3. This data highlights the importance of taking environmental impact into account when developing and using AI tools.
Note from the DPO validated by the Legal Department on the compliance and uses of AI at UNIL, covering the management of sensitive, personal and professional data as well as good usage practices.
Despite the efforts made by major companies in offering data protection options, no online AI tool is currently able to guarantee 100% confidentiality of the data fed into it. When information is fed into an AI, it is transferred from point A (the person's computer) to point B (the servers on which the AI processing the information is based). If the user feeds personal data to an AI (for example, by asking it to analyse extracts from interviews or a file containing socio-demographic data), there is generally a communication of personal data to a foreign country.
As an example, OpenAI, which is headquartered in the USA, clearly states that by default ChatGPT chats (including the paid version ChatGPT Plus) are stored on their servers in the USA and could be used to hinder their business models. However, the USA is not considered by the Swiss or European data protection authorities as a country offering a level of data protection equivalent to their own (a non-adacute country within the meaning of the law). For the time being, therefore, it is illegal in Switzerland and Europe to feed AIs with personal and, a fortiori, sensitive data (health data, political opinions, etc.). The recent adoption of a new EU/US data protection agreement, the Data Privacy Framework, is an encouraging step towards regularising the transfer of personal data across the Atlantic. Switzerland may join the scheme in the near future. UNIL remains responsible for the lawfulness of these transfers and must, in particular, ensure that subcontractors are party to the Data Privacy Framework.
In addition, when an AI system processes a set of data, it learns about the patterns and configurations present in that data. This means that it is likely to memorise and restore elements from this data in its output. This creates major problems if the AI is calibrated on personal, sensitive or confidential data. For example, if the software processes medical data, financial information or any other personal data, it could disclose details of this data in the content it produces, even indirectly or disguisedly. In the current configuration of these tools, the protection and confidentiality of data is therefore not respected.
People using these tools must therefore exercise extreme vigilance to ensure that data is protected and kept confidential. This may require simple rules such as informing participants, anonymisation, pseudonymisation, adopting approved data, etc.
In conclusion, while AI algorithms offer powerful analysis and synthesis capabilities, it is imperative that their use is carried out in compliance with the law. Guaranteeing the protection of personal data and data confidentiality in general is not only an ethical issue, but also a legal obligation.
This is why members of the UNIL community must inform themselves of the potential risks and take all necessary precautions when using these tools.
No. You should avoid associating your UNIL email address or OneDrive UNIL folder with artificial intelligence tools such as ChatGPT for the time being. The only tool you can associate your UNIL email address with is Microsoft Copilot.
As mentioned above, these technologies often use data to hinder their modus operandi, which can cause significant risks such as breach of confidentiality of personal data, disclosure of sensitive information and breach of obligations relating to official secrecy. The IT Centre is waiting for solid guarantees from the AI tools regarding their security and respect for confidentiality before considering associating them with a UNIL account.