Role of Natural Language in Driving Innovation in Legal Industry

Natural language has important and diverse roles to play in facilitating critical innovation the future legal industry will require.

Author: Argyri Panezi,Assistant Professor at IE Law School and Research Fellow at Stanford University

While the debate on whether data is the new oil is ongoing (compare Wired Magazine’s No, Data Is Not The New Oil, February 2019, to The Economist’s The World’s Most Valuable Resource Is No Longer Oil, But Data, May 2017) the rising importance of data is becoming obvious to an increasing number of people. This includes the importance of data access, data ownership, the vast digitization of data, and finally the use of big data to train algorithms and create intelligent machines with ever-increasing usage and applications in the business sector and beyond. Together with many promises (efficiency, predictive analytics, artificial intelligence) the rise of big data has also come with a number of threats, as for example the rise of data monopolies [1], serious challenges for data protection and privacy [2], and also the much-discussed phenomenon of algorithmic bias in view of the biased data fed into algorithms [3].

There are many instances of algorithmic uses of big data which have transformed a number of sectors. Take for example the advertising sector, and the use of big consumer data to predict consumer preferences and then facilitate targeted advertising. There is also, for instance, a rising use of big data in the healthcare sector, leading not only to critical changes in predictive medicine but also in the insurance sector.

Another example is the use of big data and predictive analytics in policing. This article focuses on the use of big data in the legal sector, and specifically on the development of legal tech tools: how legal resources (legislation, judicial decisions, Court submissions and other resources, such as legal treatises and scholarship) are used to train algorithms that will provide important tools for lawyers and judges.

Data is relevant for law; and so is the data revolution relevant to shaping the future of the legal profession as well as global legal education. Among the many other sectors affected, big data is relevant for the future of the law in many ways: data analytics can help us predict how judges make decisions, and can thus become a powerful tool in the hands of both lawyers and judges. And while we might not be quite ready to accept robot judges, AI treaty drafting or a robot negotiator, the transformation of the world of legal practice in the light of legal technology innovations is already underway. Global legal education has also been affected: law school curricula adapt quickly to provide students with the necessary technical understanding and skills to enter a legal world where the management of data is of critical importance, where legal tech startups are proliferating, and where machine learning and artificial intelligence are of the utmost relevance for the future of the profession.

Data is relevant for law; and so is the data revolution relevant to shaping the future of the legal profession as well as global legal education. Among the many other sectors affected, big data is relevant for the future of the law in many ways.

A relatively understudied element of computerized data, especially data relevant in legal technology applications (mostly digitized or born-digital legal resources), is the natural language in which the information exists and in particular the multilingual nature of legal resources around jurisdictions. English legal resources can help build algorithms that could be useful in litigation by, for example, predicting or helping to predict legal outcomes in English courts. This data includes pleadings, other submissions to courts, legal decisions, legislation, regulation, treatises etc. While the legal world is increasingly becoming more global, linguistic diversity among different jurisdictions and legal cultures persists. But is this legal diversity reflected in the development of legal tech tools?

The challenges in working with different languages in creating, interpreting and applying the law are substantial, even in the context of experienced multilingual supranational structures. In one particular example: the EU institutions, which are some of the most important multilingual legislative bodies around the world, and their need to grapple with the 24 different languages of their member States. It’s hard enough just thinking about the role of language in the European Parliament deliberations, or the role of the legal translation services officially translating new directives and regulations as they come out of the European Council [4]. Finally, think of the Luxemburg courts hearing pleadings in different languages and publishing decisions in all official languages. While the legal value of each language within the EU is the same, the languages are not functionally equal (or necessarily equivalent).

Indeed, there is a de facto dominance of certain languages as legal (and other) professionals gravitate towards one of the few common languages to read, write, and communicate legal texts. Thus, the development of legal tech tools is bound to be faster for the de facto dominant or widely spoken languages. This same conclusion also applies in broader contexts outside of the EU. In the global context, especially in certain legal domains (e.g. international contracts), English has been established as the lingua franca. It is thus to be expected that in this domain the development of legal technology tools will also favor the use of data written in English and subsequently producing results in that same language.

Furthermore, the ethnography of legally relevant data which is actually gathered, and then fed into legal technology algorithms in order to create algorithmic tools, is not necessarily representative of the real ethnography of legally relevant data. Dominant languages, such as English, Spanish and French seem to have the biggest potential for experimentation and growth in the legal tech space. The volume of data afforded these languages is vast, and thus the quality of those algorithmic results continues to rise. At the same time jurisdictions with official languages spoken by fewer people offer a smaller volume of data which may be affecting the quality and effectiveness of the algorithms subsequently created.  On the flip side, less widespread languages do offer niche markets for legal tech tools – i.e. algorithms which can read and process legal resources in such languages, for example, Italian, Czech or Greek. Finally, in order to build such tools, experts from those jurisdictions must be involved to determine which resources are relevant, and which are not. This creates further space for another niche market for legal experts, including legal translators. Indeed, a good national expert can flag up relevant legal databases which include legislation and case-law, and can also identify good and bad law, and also possible gaps, and suggest how to address them.

The evolution of the legal market directs changes at the level of legal education. As legal education becomes increasingly globalized, should future lawyers and judges also be trained to build and use legal tech tools? 

Given the language silos between legal jurisdictions, and also the differences in legal cultures in the broadest sense of the term (the most notable example is the different writing styles of judges in different jurisdictions), there is ample room for diverse legal tech market tools, just as there is room for diversity of languages in legal jurisdictions.

The evolution of the legal market directs changes at the level of legal education. So what are the implications for legal education at a global level? As legal education becomes increasingly globalized, should future lawyers and judges also be trained to build and use legal tech tools? And if so, in which language(s)? Arguably, besides tech skills which are increasingly becoming a learning priority in legal education, the lawyer of the next generation could benefit from a systematically comparative reading and understanding of the law, from familiarity with multiple jurisdictions, and ideally, from multilingual skills. These are skills and assets which would allow future lawyers to actively participate in the creation of legal tools in their respective jurisdictions, and in niche areas of practice. One could predict that fluency in more than one of the most-spoken natural languages will remain a valuable asset to face the new challenges and opportunities that the recent transformations of the legal profession will bring about. And this future innovation in the legal industry, I argue, now also extends to transformations in the legal tech domain.

Article published in Legal Business World

Dr. Argyri Panezi, Assistant Professor of Law and Technology at IE University, is an expert in law and technology and intellectual property. She specializes in Internet law and policy, intellectual property law, with an emphasis on digital copyright, as well as data protection, intellectual goods management, automation, machine learning and AI. Her current research focuses on digitization and AI.

Note: The views expressed by the author of this paper are completely personal and do not represent the position of any affiliated institution.

[1] See, for example, Tim Wu, The curse of bigness: Antitrust in the new gilded age. (2018).
[2] The Medium, The Privacy Paradox: Is the End of Privacy Inevitable? The Price of Convenience, August 2017, at
[3] See Alex​ ​Campolo,​ Madelyn​ ​Sanfilippo,​ ​ Meredith​ ​Whittaker,​ ​& Kate​ ​Crawford,​ ​AI Now 2017 Report, available at
[4] General Secretariat of the Council Directorate Report, The language service of the General Secretariat of the Council of the European Union – making multilingualism work (2012), available at