OpenAI beefs up its computer code-generating AI
In recent months, the company OpenAI has not ceased to make the buzz thanks to its chatbot ChatGPT, capable of answering questions in natural language from Internet users, and so efficient that it now threatens Google. But ChatGPT is not the only product in development at OpenAI. And among the company’s most promising artificial intelligences is Codex. We talk less about it, since it is a product specifically created for developers.
Indeed, the idea is to create an AI capable of generating computer code from natural language requests formulated by the user. And when it comes to programming, Codex could have a big impact, as it allows companies and developers to build apps faster, or with fewer programmers. Moreover, Codex is already in use, since it is the AI behind the Copilot functionality of Microsoft’s Github platform (OpenAI investor).
The technology behind GitHub’s Copilot
On Github, Copilot works a bit like autocompletion, but for writing code. In any case, OpenAI is improving the technology. And according to an article published by the Semafor site, it works with an army of contractor developers (and not employees) to optimize Codex. Today, OpenAI is a relatively small company compared to the tech giants, with only 375 employees. However, it relies heavily on contractors to perform certain tasks necessary for training the AI. And precisely, according to Semafor, OpenAI would have recently recruited 1,000 contractors, in order to beef up its products.
60% of these new contractors would be responsible for labeling data to feed AIs, while 40% would be programmers responsible for training Codex. Indeed, while Codex has so far been trained with data from GitHub, OpenAI would also like to use code and comments specifically created for training artificial intelligence. “Mastering more than a dozen programming languages, Codex can now interpret simple natural language commands and execute them on behalf of the user, creating a natural language interface for existing applications”reads the description of Codex by OpenAI.
To provide code and feedback
According to a developer quoted by Semafor, when testing OpenAI, he was first given a problem to solve with code, and he had to write in English how he intends to approach this problem. Then, in a second phase, he wrote the code. And when a bug is found on this code, he was asked to write about this bug, and how the code should be fixed. “They most likely want to feed this model with a very specific type of training data, where the human provides a step-by-step presentation of their thought process,” also explained the developer, quoted by Semafor.
Of course, the info is not official. But it should be noted that it is no secret that OpenAI relies on contractors to carry out this kind of task when developing its models. In a scientific paper published in 2022, for example, the company thanked these contractors. “Finally, we would like to thank all our contractors for providing the essential data for training the models”could we read in this publication.