Artificial Minds, Human Consequences: Unraveling AI’s Impact on Education, Cognition, and Cultural Production
Empereur Pirate
Posted on October 19, 2024
Pedagogical Limits of Conversational Robots: A Hindrance to Learning ?
In terms of pedagogy, conversational robots represent a teaching modality characterized by “doing for” the student. While this form of qualitative support can provide models to reproduce or accelerate repetitive tasks already acquired and mastered by the student, the generation of educational content, on the other hand, degrades the effectiveness of tutoring in terms of learning. Instead of promoting investment in a unique pedagogical relationship, for example with a more advanced student with whom the subject can identify, language models provide impersonal content through virtual communication channels that remain limited compared to a human relationship. The production of educational exercises by automated cognition corresponds to a form of pedagogical consumption for users, which establishes a break between the global development of personality and the appropriation of intellectual knowledge. The consultation of educational or specialized books by students also presents similar disadvantages, those of solitary learning dissociated from interactive and emotional exchanges with a tutor or teacher. However, qualitative assistance software, by simulating emotional aspects of human communication, risks introducing a profound social withdrawal, with the disinvestment of human educational relationships necessary for the child’s emotional development. Moreover, if language models are used to provide answers to exercises not mastered by the student, or content unsuitable for their level of understanding and cognitive maturity, it is the learning process itself that becomes disjointed.
Risks for Cognitive Development and Student Autonomy
The student’s autonomy to experiment, to grope by trial and error could thus be hindered in its normal development by an inappropriate use in relation to the cognitive development of each student. From this point of view, AI-assisted pedagogical tools are not relevant before obtaining a high school diploma or a university degree. The pedagogical uses of potential qualitative assistance could be introduced in the context of doctoral research initiation work, in order to extend the field of accessible knowledge. For the youngest primary and secondary school students, only children with a handicapping learning disorder could truly be helped to compensate for their cognitive deficiency with software adapted to their difficulties, while others risk degrading their cognitive performance by using technological facilities instead of their learning abilities. The solution to these major risks for the mental health of the population should not, as is the case for “state of the art” models, degrade the efficiency of generations for all users through security policies that attempt to program qualitative assistance robots to refuse certain generation requests. Not only do these attempts to align AI software with human values such as probity remain vulnerable to easy circumvention, but they also imply that engineers teach AI to disobey user requests. Cheating on homework is rarely explicitly formulated by dunces who resort to ChatGPT. Moreover, in the absence of differentiated versions for adults and children, the same generation request can be harmful for a student’s learning and beneficial for a more advanced student seeking consolidation recall. The same student can, by progressing and evolving, express different needs in terms of qualitative assistance.
Supervising AI: Security and Ethical Challenge
Regarding security, policies consisting of teaching disobedience to artificial intelligence software, by refusing to respond to certain requests or censoring content deemed offensive, is on the one hand contrary to the very principle of security and reliability of use. On the other hand, these security filters degrade the performance of language models, so that, for example, for OpenAI’s chatbot, masked processes called “chains of thought” have appeared, which establish an automated background dialogue of the model with an uncensored version of itself. Security is not guaranteed, with software likely to disobey based on implicit instructions generated in an unobservable manner. It is indeed problematic that alignment policies are based on the partial and biased values of engineers employed by profit-centered companies and not on respecting user requests.
Opacity of AI Functioning and Intellectual Property Issues
Fundamentally, the phenomenon of latent learning through qualitative emergence, that is, the unobservable instructions that the model gives itself from its training and usage data, make the functioning of AI software totally opaque. In any case, no willingness has appeared on the part of companies that have raised billions of dollars in funds to facilitate control of intellectual property compliance in training data. These startups claim to have used all available data on the Internet, including Wikipedia and pornographic sites, however, complaints have been filed in American courts regarding the integration of non-free content. The New York Times’ lawsuit against OpenAI and Microsoft, filed in late December 2023, denounces the unauthorized use of the newspaper’s copyrighted content to train AI models, including ChatGPT. The Times alleges that OpenAI and Microsoft have exploited millions of articles to build their generative AI products without license or authorization, thus violating the newspaper’s copyrights. Thus, this case reveals a new technological turning point in the difficult adaptation of intellectual property laws to evolving uses. The rise of illegal downloading had undermined the profits of multinational cultural content producers, showing the inadequacy of their economic model with content sharing on the Internet. Financial stakes have led to intense lobbying and monopolistic strategies to control uses that challenged copyright. With artificial intelligence, the legitimacy of major players in the cultural sector is further shaken, as musical AIs, for example, become capable of creatively generating pseudo-original content from training data.
Bypassing Copyright and the Race for Computing Power
However, now new market players, such as the press in the case of the New York Times, see their intellectual property shattered and their content competed with by generations derived from their own data. Faced with this resistance from rights holders against technological progress, the official strategy of AI companies consists, on the one hand, of generating training data from language model-assisted simulations. On the other hand, increasing the available computing power to multiply the iterations necessary for pseudo-random training allows them to improve the efficiency of language models despite the copyright lock on certain data. These tactics involve a financial and environmental cost to develop infrastructures, with the risk of basing the architecture of language models on lower quality data. In this regard, the ambition to build a super-intelligence capable of automating scientific research and artistic creation by surpassing human intelligence seems incompatible with the exclusion from training data of all content protected by copyright or patent. Eventually, national libraries should digitize all their books to gather training data for public language models funded by states. It would also be possible that the evolution of cultural consumption, scientific research, and artistic creation uses moves towards a monopolistic position that would end up absorbing rights holder companies by buying them out in favor of the collapse of the intellectual property system implied by generative technologies.
The Illusion of Creativity: Limits of Generative Models
Generative models, with their creative uses, present combinations of pre-existing content. To this extent, the resulting creations risk lacking innovation or originality and causing a qualitative weakening of cultural content. Indeed, human creation is not a random phenomenon but, on the contrary, subjective and personal, while creative AIs rely on random mechanisms called “stochastic” that aim to reinforce the effectiveness and robustness of learning. The sources of randomness during the training phase are applied during the weight initialization and data sampling steps. At the beginning of training, the weights of the neural network are generally initialized randomly, using pseudo-random number generators (PRNGs) that are deterministic but produce sequences that seem random. To ensure the reproducibility of experiments, researchers often use a “seed” to initialize the random number generator. This seed can be based on the system’s internal clock at the start of training, but once set, the “random” sequence generated will always be the same for that seed. In some cases, particularly for cryptography or when true randomness is necessary, hardware entropy sources of the system can be used, including very precise time measurements. However, this is generally not the case for language model training.
The Stochastic Nature of AI Learning
The duration and timing of training can also have an impact on the final result of the model, for example with early stopping strategies or learning rate adjustment. The temporality of training thus remains an important factor in the overall learning process, although other mechanisms are added such as data sampling, where the order in which training examples are presented to the model is randomized, or regulation techniques with methods like dropout that introduce randomness during training to avoid overfitting. Other stochastic optimizations employ algorithms such as SGD (Stochastic Gradient Descent) to generate random subsets of data at each iteration. The nature of information processing in language models thus differs radically from that of the human brain. AI models, particularly those based on the Transformer architecture, process information in a massively parallel manner. Unlike a sequential step-by-step process, all parts of the input are processed simultaneously, through multiple layers of attention. This parallel processing simulates simultaneous processing, but in reality the input is sorted into different parts according to a procedural logic that will then process these different parts separately, which is the opposite of synchronous analog processing. Indeed, parallel processing in language models is a simulation executed on fundamentally sequential architectures, i.e., digital processors. This simulation creates the illusion of simultaneous processing, but at a fundamental level, there is always an underlying sequentiality. To this simulated parallelism are added decomposition and recomposition processes, during which the input (the prompt) is effectively decomposed into different parts (tokens, embeddings) which are then processed separately through the different layers of the neural network. This decomposition follows a procedural logic defined by the model’s architecture. Processing is done layer by layer, each layer operating on the outputs of the previous one. Although operations within a layer can be parallelized, progression through the layers remains sequential. Attention mechanisms allow a form of contextual processing, where each part of the input is related to all others. However, this process remains discrete and iterative, unlike the continuous and truly simultaneous nature of an analog system.
Parallel Processing vs. Analog Processing: The Fundamental Difference
Natural language processing by generative assistance models involves a sequential discretization of language into tokens that is fundamentally different from the continuous and analog processing of language by the human brain. The brain can simultaneously integrate information from different temporal modalities in a way that current language models cannot faithfully reproduce. Generative models have no true long-term memory or continuous learning capacity. Each prompt is processed independently, without “memory” of previous interactions, or for the most recent advances, within the limit of a reduced number of elements to process. After training, language models no longer have internal temporality. They process information statically, based only on patterns learned during training. However, a language model doesn’t really follow a linear path through obstacles. We could imagine it as a process of parallel activation of multiple neural “paths”, where the relative importance of each path is punctually adjusted according to the context limited by the input. In contrast, the human brain combines different temporalities. Thus, the synchronous processing of information is carried out simultaneously within different brain regions. Diachronic temporality, on the other hand, involves the ability to integrate information on different time scales, from the distant past to the anticipated future. Finally, the sequential processes of procedural memory allow for following action sequences and learning new procedures. This temporal richness allows the human brain flexibility and adaptability by combining different temporalities and remains out of reach for the most advanced language models.
Posted on October 19, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.