prompt for professional legal terminology and enforce monetary subject extraction

Add a standard-legal-terminology instruction to the reason/object,
judgment-result, entities, and summary prompts (ZH + EN). Require
case_object to carry specific monetary amounts, reinforced with a
few-shot example and a context-aware validator that retries when the
source mentions a sum but no amount was extracted.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
main
fengruixiang 2026-05-28 12:28:39 +08:00
parent ff81d2af09
commit bc2b9826d5
1 changed files with 60 additions and 9 deletions

View File

@ -760,13 +760,19 @@ REASON_OBJECT_SYSTEM_ZH = f"""從香港判決書中抽取:
2. case_object標的物
- 訴訟請求指向的實體權利或利益
- 汽車黃金珠寶不動產產權撫養權人身傷害賠償合同履行房產所有權精神困擾賠償居留權
- 涉及金錢標的時必須提取並標明幣種與具體金額拖欠貨款 HK$850,000索償 HK$1,000,000拖欠租金 HK$120,000金額未定或待評定者註明金額待評定
- 合併本質相同的標的
- 嚴禁證據材料程序性訴求"要求法庭裁決"法律條文名稱
用語要求一律使用規範的法律專業用語申索損害賠償違約侵權衡平法濟助避免口語化或不準確的表述
只輸出 JSON"""
REASON_OBJECT_FEWSHOT_ZH = """範例輸出:
{"case_reason":"申索人為商場保安員就被告於2023年7月在商場毆打申索人造成的人身傷害向被告提出損害賠償申索。","case_object":["人身傷害賠償","醫療費用","精神困擾賠償"]}"""
REASON_OBJECT_FEWSHOT_ZH = """範例輸出1人身傷害
{"case_reason":"申索人為商場保安員就被告於2023年7月在商場毆打申索人造成的人身傷害向被告提出損害賠償申索。","case_object":["人身傷害賠償","醫療費用","精神困擾賠償"]}
範例輸出2金錢申索標的物須含具體金額
{"case_reason":"原告就被告未支付2022年買賣合約項下的貨款向被告提出追討欠款的申索。","case_object":["拖欠貨款 HK$850,000","合約利息","訟費"]}"""
# 英文提示詞
REASON_OBJECT_SYSTEM_EN = f"""Extract from Hong Kong judgment:
@ -781,16 +787,37 @@ REASON_OBJECT_SYSTEM_EN = f"""Extract from Hong Kong judgment:
2. case_object (Subject Matter):
- Tangible rights or interests targeted by the claim
- Examples: vehicle, gold, jewelry, property title, custody, personal injury damages, contract performance, property ownership, distress damages, right of abode
- When a monetary subject matter is involved, it MUST be extracted with currency and the specific amount (e.g., "outstanding goods price HK$850,000", "claim of HK$1,000,000", "arrears of rent HK$120,000"); if the amount is unascertained, note "amount to be assessed"
- Merge essentially identical subjects
- MUST NOT include: evidentiary materials, procedural requests (e.g., "seeking court ruling"), names of statutes
Terminology requirement: consistently use precise, standard legal terminology (e.g., "claim", "damages", "breach of contract", "negligence", "equitable relief"); avoid colloquial or imprecise wording.
Output only JSON."""
REASON_OBJECT_FEWSHOT_EN = """Example Output:
{"case_reason":"Plaintiff, a shopping mall security guard, claims damages from defendant for personal injuries caused by defendant's assault at the mall in July 2023.","case_object":["personal injury damages","medical expenses","distress damages"]}"""
REASON_OBJECT_FEWSHOT_EN = """Example Output 1 (personal injury):
{"case_reason":"Plaintiff, a shopping mall security guard, claims damages from defendant for personal injuries caused by defendant's assault at the mall in July 2023.","case_object":["personal injury damages","medical expenses","distress damages"]}
Example Output 2 (monetary claim, subject matter must carry the specific amount):
{"case_reason":"Plaintiff claims against defendant for non-payment of the price of goods supplied under a 2022 sale and purchase contract.","case_object":["outstanding goods price HK$850,000","contractual interest","costs"]}"""
def _reason_object_validator(out: dict, lang: str = 'zh') -> tuple[bool, str]:
# 金錢數額識別:幣種前綴 + 數字,或數字 + 中文金額單位
MONEY_RE = re.compile(
r"(?:HK\$|US\$|RMB|MOP|港幣|港元|人民幣|美元|美金)\s*[\d,]+(?:\.\d+)?"
r"|[\$]\s*[\d,]+(?:\.\d+)?"
r"|[\d,]+(?:\.\d+)?\s*(?:萬元|億元|元|萬|億)",
re.I,
)
def _object_has_amount(objs: list[str]) -> bool:
"""case_object 中是否已含具體金額(任一項出現數字即視為已提取金額)"""
return any(re.search(r"\d", o or "") for o in (objs or []))
def _reason_object_validator(out: dict, lang: str = 'zh',
context: str = "") -> tuple[bool, str]:
r = out.get("case_reason", "")
max_length = 100 if lang == 'zh' else 200 # 英文允許 2 倍字符數
target_length = 80 if lang == 'zh' else 160 # 建議壓縮目標
@ -818,7 +845,19 @@ def _reason_object_validator(out: dict, lang: str = 'zh') -> tuple[bool, str]:
return False, f"case_reason 不應包含判決結果詞彙「{keyword}」,請只描述訴訟起因和請求。"
else:
return False, f"case_reason should not contain judgment result term '{keyword}', describe only cause and relief sought."
# 金錢標的物強制提取:原文出現金錢數額但 case_object 未含金額時要求補充
if context and MONEY_RE.search(context) and not _object_has_amount(out.get("case_object", [])):
if lang == 'zh':
return False, ("原文出現金錢數額。若該數額屬於訴訟標的(如欠款、索償、賠償金額),"
"必須在 case_object 中提取並標明幣種與具體金額(如「拖欠貨款 HK$850,000」"
"若僅為無關引用則可忽略。")
else:
return False, ("Monetary amounts appear in the source. If an amount forms part of the "
"subject matter (e.g., debt, claim, damages), it MUST be extracted in "
"case_object with currency and the specific figure (e.g., \"outstanding "
"goods price HK$850,000\"); ignore only if it is an unrelated citation.")
return True, ""
@ -837,7 +876,7 @@ def extract_reason_object(client: OpenAICompatibleClient, context: str, lang: st
return client.chat_json_with_retry(system, user,
schema,
validator=lambda x: _reason_object_validator(x, lang))
validator=lambda x: _reason_object_validator(x, lang, context[:5000]))
# --- Call 3: 判決結果 -------------------------------------------------------
@ -875,7 +914,9 @@ JUDGMENT_RESULT_SYSTEM_ZH = """從香港判決書尾部的命令/裁定部分抽
- result 必須包含
a) 明確結果勝訴/敗訴/部分勝訴/維持/撤銷/駁回/發還等
b) 2-3 個關鍵法庭理由如有
c) 具體金額利率或命令內容如有
c) 具體金額利率或命令內容凡有判給/命令支付的金錢數額必須原文照錄幣種與金額 HK$28,500不得省略或約化
用語要求一律使用規範的法律專業用語判給訟費利息駁回發還重審避免口語化或不準確的表述
只輸出 JSON"""
@ -900,7 +941,9 @@ Splitting Principles:
- result must include:
a) Clear outcome (allowed/dismissed/partially allowed/upheld/quashed/remitted, etc.)
b) 2-3 key court reasons (if any)
c) Specific amounts, interest rates or order details (if any)
c) Specific amounts, interest rates or order details; whenever a sum is awarded/ordered to be paid, the currency and figure MUST be reproduced verbatim (e.g., HK$28,500), never omitted or rounded
Terminology requirement: consistently use precise, standard legal terminology (e.g., "awarded", "costs", "interest", "dismissed", "remitted for retrial"); avoid colloquial or imprecise wording.
Output only JSON."""
@ -998,6 +1041,8 @@ ENTITIES_SYSTEM_ZH = """從香港判決書中抽取所有相關實體(自然
- 純案例名稱 Donoghue v Stevenson
- 文獻期刊名
用語要求reason 一律使用規範的法律專業用語主審闡述先例判詞避免口語化或不準確的表述
只輸出 JSON"""
ENTITIES_FEWSHOT_ZH = """範例輸出:
@ -1022,6 +1067,8 @@ MUST NOT include:
- Pure case names (e.g., Donoghue v Stevenson)
- Literature, journal names
Terminology requirement: write each reason in precise, standard legal terminology (e.g., "presiding", "articulated", "precedent", "judgment"); avoid colloquial or imprecise wording.
Output only JSON."""
ENTITIES_FEWSHOT_EN = """Example Output:
@ -1106,6 +1153,8 @@ SUMMARY_SYSTEM_ZH = """根據已抽取的結構化字段 + 法庭分析段,撰
**重要judgment_summary 必須使用中文撰寫**
用語要求一律使用規範的法律專業用語涉及金錢的判給或標的須保留具體金額含幣種
嚴格 300 只輸出 JSON"""
# 英文提示詞
@ -1122,6 +1171,8 @@ Four-element structure (must cover all, in coherent single paragraph):
**IMPORTANT: judgment_summary MUST be written in English.**
Terminology requirement: consistently use precise, standard legal terminology; preserve specific monetary figures (with currency) for any award or subject matter involving money.
Strictly 500 characters. Output only JSON."""