reglab-rrc/mistral-rrc

Paper: AI for Scaling Legal Reform: Mapping and Redacting Racial Covenants in Santa Clara County

Overview of Model Details

Model name: reglab-rrc/mistral-rrc
Version: 1.0
Release date: October 17, 2024
Model type: Finetuned causal language model (Mistral 7B)
License: Open-source, licensed under the MIT License
Language: English Domains: Legal documents (real property deeds)
Task: Text classification and extraction (racial covenant detection)

Usage

Here is an example of how to use the model to find racial covenants in a page of a deed:

from transformers import AutoTokenizer, AutoModelForCausalLM
import re

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("reglab/mistral-rrc")
model = AutoModelForCausalLM.from_pretrained("reglab/mistral-rrc")

def format_prompt(document):
    return f"""### Instruction:
Determine whether the property deed contains a racial covenant. A racial covenant is a clause in a document that \
restricts who can reside, own, or occupy a property on the basis of race, ethnicity, national origin, or religion. \
Answer "Yes" or "No". If "Yes", provide the exact text of the relevant passage and then a quotation of the passage \
with spelling and formatting errors fixed.

### Input:
{document}

### Response:"""

def parse_output(output):
    answer_match = re.search(r"\[ANSWER\](.*?)\[/ANSWER\]", output, re.DOTALL)
    raw_passage_match = re.search(r"\[RAW PASSAGE\](.*?)\[/RAW PASSAGE\]", output, re.DOTALL)
    quotation_match = re.search(r"\[CORRECTED QUOTATION\](.*?)\[/CORRECTED QUOTATION\]", output, re.DOTALL)
    
    answer = answer_match.group(1).strip() if answer_match else None
    raw_passage = raw_passage_match.group(1).strip() if raw_passage_match else None
    quotation = quotation_match.group(1).strip() if quotation_match else None
    
    return {
        "answer": answer == "Yes",
        "raw_passage": raw_passage,
        "quotation": quotation
    }

# Example usage
document = "[[Your property deed text here...]]"
prompt = format_prompt(document)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
result = tokenizer.decode(outputs[0])
parsed_result = parse_output(result)

print(parsed_result)

Input and Output Formats

The model was trained with the input and output formats above, so please make sure to use these formats when running inference.

Input Format: The model accepts property deed documents in text format. It expects properly formatted prompts based on the instructional format outlined in the usage example, including the instruction to detect racial covenants and provide corrected text if found.
Output Format: The output includes a response that provides:
- An answer to whether a racial covenant is present ("Yes" or "No").
- The raw text of the racial covenant if detected.
- A corrected quotation of the racial covenant text with spelling and formatting errors fixed.

Intended Use

The finetuned Mistral model (reglab-rrc/mistral-rrc) is designed to detect and extract racial covenants from property deed documents. Racial covenants are clauses that historically restricted property ownership or residence based on race, ethnicity, national origin, or religion. This model aims to aid jurisdictions, such as Santa Clara County (CA), in identifying these covenants for removal or redaction, as mandated by laws like California's AB 1466. The intended use is to prioritize documents for review, reducing the time and resources required for human auditors to locate RRCs manually, particularly in large datasets of property deeds. Legal professionals and government entities can integrate the model into workflows to streamline and scale up the process of identifying racially discriminatory language in real estate records.

Training Data

The Mistral 7B model was finetuned on a collection of property deed documents gathered from eight counties across the United States, including Santa Clara County (CA). To account for potential variations in document formatting, OCR quality, and phrasing, data augmentation included property deeds from other jurisdictions, such as Bexar County (TX), Cuyahoga County (OH), and Hidalgo County (TX). In total, the training dataset comprised 3,801 annotated deed pages, with 2,987 (78.6%) containing racially restrictive covenants. The dataset was balanced with both positive and negative examples, derived from keyword-based searches and manual annotation efforts. The data was annotated through a multi-stage process, which included manual verification of model predictions and the development of a web-based annotation tool for more efficient data labeling. (For additional details about data augmentation and training, please refer to our paper.)

Performance

The finetuned model was evaluated on a held-out test set of 739 pages from the original dataset, with approximately 70% of these pages containing racial covenants. Performance metrics for the model include page-level precision, recall, and F1 score, as well as span-level BLEU scores, to measure how accurately the model reproduced the exact span of the detected covenant text. The results are as follows:

Precision: 1.000 (95% CI: 0.995-1.000)
Recall: 0.994 (95% CI: 0.984-0.997)
F1 score: 0.997
BLEU score: 0.932 (for span-level accuracy of detected covenants)

The finetuned Mistral model outperformed other approaches, including keyword and fuzzy matching as well as zero-shot and few-shot GPT models, particularly in recall and precision.

Limitations

Despite the performance of the finetuned Mistral model in detecting racial covenants, several limitations remain that must be considered and stated:

Generalizability Across Jurisdictions: This model was primarily finetuned on property deeds from eight counties, including Bexar County (TX), Cuyahoga County (OH), and Santa Clara County (CA). While we took care to include a variety of document types and OCR qualities, property deed language and formatting can vary significantly by jurisdiction. As a result, the model's performance may degrade when applied to regions with distinct linguistic, legal, or historical document structures. Future efforts should include jurisdiction-specific validation to ensure accurate detection in areas with unique property deed formats.
Sensitivity to OCR Artifacts: Although the model is robust to many types of OCR (optical character recognition) errors, heavily degraded documents or those with extremely poor scan quality may still pose challenges. Scanning artifacts can introduce noise that obscures key terms, leading to either missed racial covenants (false negatives) or incorrect detections (false positives). This remains a potential source of error, particularly in counties with older, handwritten, or poorly preserved records.
Contextual Ambiguity: The model relies on semantic analysis to identify racial covenants, and while this enhances its ability to detect atypical language, some ambiguity remains. For instance, terms like "white" could refer to a racial category or a person's name, and the model's ability to disambiguate such terms is not perfect, especially in cases where poor scanning quality makes it difficult to distinguish the usage of the ambigious term based on the semantic content of the deed. In such cases, legal professionals must still verify the results, ensuring no improper redactions or omissions occur.
Historical Document Complexity: The language used in older property deeds can be complex and archaic. Some racial covenants may be expressed in subtle or convoluted ways that could evade even the most advanced language models. While the model has shown strong performance in capturing most covenants, human oversight remains crucial, particularly for documents with unusual or legally obscure phrasing.
Dependency on Human Review: Although the model reduces the manual work pretty significantly, legal review is still required for final verification. This human-in-the-loop approach mitigates the risk of false positives, but it does not entirely eliminate the need for expert intervention, particularly in the redaction and historical preservation processes.

Ethical Considerations

The deployment of a language model for detecting racial covenants raises several important ethical considerations. We have done our best to carefully address these concerns throughout the project:

Preservation of Historical Memory: A key ethical consideration in this project is balancing the removal of offensive language from property deeds with the need to preserve historical records. While the model identifies and assists in redacting racially restrictive covenants, these covenants are also preserved in a historical registry by the County. This ensures that the history of housing discrimination is not erased but documented and made accessible for future research and public awareness. The creation of this historical record serves as an educational tool to understand the deep and troubling legacy of racial exclusion in housing markets.
Accountability and Oversight: The system has been designed with a clear chain of accountability, as required by California’s AB 1466. All flagged documents must undergo legal review, ensuring that no inappropriate redactions occur and that the process is transparent and accurate. This human oversight safeguards against over-reliance on automated systems, which, while highly effective, are not infallible. Our current AI-driven pipeline prioritizes documents for review, but final decisions rest with human experts (most specifically, legal professionals), mitigating the risk of both false positives and false negatives.
Bias and Fairness: The model is trained on historical documents that reflect the racial and social biases of the time. While the model itself is neutral in its detection of racially restrictive language, the training data may inherently carry these biases, as they originate from a time when discriminatory covenants were legally permissible. Ongoing efforts are required to ensure that the model does not perpetuate unintended biases, especially in jurisdictions with different historical contexts. Regular validation across diverse datasets and jurisdictions is essential to prevent any unfair outcomes.
Accessibility and Open Model: By choosing to finetune an open-source model (Mistral 7B), this project has prioritized transparency and accessibility. This decision makes the technology available to smaller counties and community-based organizations, many of which lack the resources to develop or license proprietary solutions. The release of the model empowers a broader range of actors to engage in legal reform efforts, fostering greater equity in the identification and removal of racial covenants. Additionally, privacy concerns have been addressed by masking private information in the training data, ensuring that the model does not learn or reproduce sensitive data.
Advancing Public Good: This project exemplifies how AI can be leveraged for the public good. By revealing patterns of housing discrimination and aiding in legal reform, the model contributes to ongoing efforts to address historical injustices. Beyond merely automating a legal task, this project enhances our understanding of systemic racism in the housing market, adding valuable insights to the academic and public discourse. It is a powerful illustration of how technology can assist in the pursuit of justice, equity, and historical accountability.

Citation

If your work makes use of our model, data, or results, we request that you cite our paper as follows:

@article{suranisuzgun2024,
  title={AI for Scaling Legal Reform: Mapping and Redacting Racial Covenants in Santa Clara County},
  author={Surani, Faiz and Suzgun, Mirac and Raman, Vyoma and Manning, Christopher D. and Henderson, Peter and Ho, Daniel E.},
  url={https://dho.stanford.edu/wp-content/uploads/Covenants.pdf},
  year={2024}
}

reglab-rrc
/

mistral-rrc