USING GENERATIVE AI FOR CLASSIFICATION OF LEGAL DOCUMENTS

Ruan Dias Santana; Gabriel Reis Nadler Prata; Marcelo  da Silva Lisboa; Silvanete Maria da Silva; Marcelo Lisboa Rocha

doi:10.66104/4zsfgz56

Authors

Ruan Dias Santana Universidade Federal do Tocantins
Gabriel Reis Nadler Prata Universidade Federal do Tocantins
Marcelo da Silva Lisboa Secretaria de Administração do Tocantins (SECAD)
Silvanete Maria da Silva Secretaria de Educação do Tocantins (SEDUC)
Marcelo Lisboa Rocha Federal University of Tocantins (UFT) https://orcid.org/0000-0002-4034-0021

DOI:

https://doi.org/10.66104/4zsfgz56

Keywords:

Judicial Efficiency, Legal Technology, Artificial Intelligence in Law, Procedural Automation, Digital Justice

Abstract

The Brazilian judicial system is currently overwhelmed by an enormous backlog of digital lawsuits, making manual case sorting both financially draining and unreliable. This research explores the integration of Generative Artificial Intelligence to streamline the categorization of legal petitions through Large Language Models (LLMs). The study outlines a technical progression divided into three distinct phases. First, a few-shot learning model was tested, resulting in a modest accuracy rate of 56%. Second, the methodology was improved using prompt engineering combined with N-gram analysis and data augmentation strategies to address the issue of skewed datasets. Finally, the research implemented a Retrieval-Augmented Generation (RAG) framework to optimize performance. Using real-world data from the Court of Justice of Tocantins, the experiments demonstrated that the RAG-based system achieved a significant 84% accuracy across 11 complex legal categories. This advanced architecture effectively minimized the occurrence of AI hallucinations and clarified semantic uncertainties often found in legal texts. The findings suggest that this innovative approach provides a reliable and scalable framework for the LegalTech industry, offering a viable path toward modernizing judicial administration. By automating the initial stages of case management, the proposed solution not only enhances operational efficiency but also ensures a higher degree of consistency in the processing of legal documents, ultimately contributing to a more agile and responsive justice system in Brazil and potentially other jurisdictions facing similar digital challenges.

Downloads

Download data is not yet available.

Author Biography

Marcelo Lisboa Rocha, Federal University of Tocantins (UFT)

Holds a bachelor's degree in Computer Science from the Catholic University of Petrópolis (1994), a master's degree in Computer Science from the Fluminense Federal University (1997), a master's degree in Electrical Engineering from the Federal University of Rio de Janeiro (1999), and a PhD in Electrical Engineering from the Federal University of Rio de Janeiro (2008). He completed a postdoctoral program in Computational Modeling at the State University of Rio de Janeiro (IPRJ campus) from 2018 to 2019, under the supervision of Prof. Antônio José da Silva Neto and co-supervised by Prof. Orestes Llanes Santiago, carrying out the project entitled "Identification and Representation of Faults in Industrial Systems Using Machine Learning and Data Mining Techniques." He is currently pursuing a postdoctoral program in Information Systems at USP under the supervision of Prof. Sarajane Marques Peres, working on the project "Application of Deep Neural Networks to Graphs and Association Rules for Identifying Anomalous Events in Process Mining." She is a member of the Editorial Board of Cereus Journal (ISSN 2175-7275). She is currently a Full Professor at the Federal University of Tocantins (UFT), Palmas campus, and a permanent faculty member in the Graduate Program in Governance and Digital Transformation (PPGGTD) and in the undergraduate program in Computer Science. He has experience in Computer Science, focusing primarily on metaheuristics, operations research, machine learning, deep learning, and high-performance computing. Currently, his main research area is data analysis using artificial intelligence and statistical techniques. She is an advisor for the Master's and Doctoral programs in Governance and Digital Transformation (Systems Optimization and Computational Intelligence).

References

BENTO, F. M.; TEIVE, R. C. G. Classificação de documentos jurı́dicos utilizando a arquitetura transformer: uma análise comparativa com algoritmos tradicionais de Machine Learning e ChatGPT. Brazilian Journal of Development, v. 9, p. 20208–20224, 2023. DOI: https://doi.org/10.34117/bjdv9n6-97

BROWN, T. et al. Language models are few-shot learners. Advances in neural information processing systems, 2020.

DEVLIN, J. et al. BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). [S.l.]: [s.n.]. 2019. p. 4171–4186.

LEWIS, P. et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems. [S.l.]: [s.n.]. 2020. p. 9459–9474.

MAURITZ, B. J. Automatic classification of legal documents. Master’s thesis, Masarykova univerzita. 2018.

SHUKLA, B. et al. Challenges and issues in legal documents classification. AIP Conference Proceedings. 2023. DOI: https://doi.org/10.1063/5.0161060

TEAM, G. et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.

VASWANI, A. et al. Attention is all you need. Advances in neural information processing systems 30. 2017. p. 5998–6008.

USING GENERATIVE AI FOR CLASSIFICATION OF LEGAL DOCUMENTS

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biography

References

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

MDPI - Metrics

Google Citations

Metrics - OpenAlex

Metrics

Social Media

Language

Keywords

Information