What is Predictive Coding?


Moore v. Publicis Groupe, No. 11 Civ. 1279 (ALC)(AJP), 2012 U.S. Dist LEXIS 23350 (S.D.N.Y. Feb. 24, 2012)

Magistrate Judge Peck had previously written an article advocating the use of predictive coding in the appropriate case, but had observed that many attorneys were reluctant to use the technology absent a judicial opinion approving its use. In Moore, Judge Peck approved its use.

The case involved a suit by plaintiff female employees of defendant advertising agency, alleging gender discrimination in pay and promotion practices, as well as unfair terminations, demotions and job reassignments during a corporate reorganization. Although plaintiffs sought to bring some claims as a “collective action”, they had not made any motions to that effect.

At the initial discovery conference, defendant MSL indicated that the parties were discussing use of predictive coding to cull down the over three million electronic documents, but plaintiffs had concerns over its use. The court rejected MSL’s early proposal to review only the top 40,000 documents after the computer had been fully trained, noting that total production would depend upon what the statistics revealed for the results.

After some disputes regarding custodians, MSL proposed to phase the discovery, and complete the first phase before beginning the second. Noting plaintiffs’ concerns regarding sufficient time to complete discovery, the court agreed to extend the deadline in order to pursue the phased approach. Accordingly, the court agreed to delay utilization of certain ESI sources to the second phase to give the plaintiffs time to provide more information to the court regarding those data sources.

The parties agreed to take a random sample of the entire email collection at the 95% confidence level to provide a “seed set” of 2,399 documents to train the software. The review would be conducted by senior attorneys. MSL agreed to provide the documents to plaintiffs, who could add two more sets of issue tags, which would be incorporated into the system coding. Four thousand additional documents would be generated through keyword searches from both defendants and plaintiffs, and all non-privileged documents comprising the seed set would be provided to plaintiffs, whether relevant or not.

The court agreed to seven rounds of iteration; i.e. in each round, 500 documents would be reviewed to determine whether the computer was returning relevant documents. After the seventh round, a random sample (2,399) of document discards would be reviewed to insure that the discards were, in fact, not relevant. The court reserved the right to order additional iterative rounds if plaintiffs objected to the results.

The court then discussed several objections by plaintiffs to its earlier rulings. First, plaintiffs objected that the court’s acceptance of predictive coding allowed defendants to escape certification under Rule 26(g) that their clients’ document production was “complete” and “correct”. The court responded that no attorney could certify that in a document set involving over three million emails that their production was “complete;” however, the rule did not require that. The Rule 26(g)(1)(A) certification applies only “with respect to a disclosure,” which actually applied to Rule 26(a)(1) disclosures involving such information as witnesses or exhibits used to support claims or defenses. Rule 26(g)(1)(B) certifications apply to discovery responses, and did not require completeness, but incorporated the Rule 26(b)(2)(C) proportionality principle.

Plaintiffs’ objection that the court’s acceptance of predictive coding was contrary to Evidence Rule 702 was misplaced, as the rule related to admissibility of evidence at trial, not the process on how documents were searched for and found during discovery.

Plaintiffs’ concerns about the reliability of predictive coding were premature. The court noted that such determinations were not possible until the process was actually run. Proportionality concerns could not be addressed until the number of relevant documents was determined, the nature of the results was evaluated (would highly relevant “smoking guns” be found), or plaintiffs decided whether to seek class certification.

In discussing some lessons for the future, the court noted the results of several studies and projects demonstrating the fallacy that manual review of documents was the “gold standard,” noting the greater accuracy and efficiency of computer-assisted review. Issues with keyword searching were noted—they tended to be over-inclusive and not very effective. The court concluded that computer assisted coding should be used in appropriate cases.

The court also noted the parties’ agreement to use predictive coding in the instant case was an important factor in the court’s decision to allow it, as well as the defendants’ transparency, which the court advocated that parties in future cases should at least discuss.

Other observations made by the court included the inability of determining when review and production could stop until the predictive coding computer had been trained and the results quality control verified. In addition, staging of discovery by starting with sources of data most likely to be relevant is a means of controlling discovery costs, and judges should be willing to grant discovery deadline extensions if additional stages were necessary. Counsel should also utilize their clients’ knowledge about the opposing parties’ systems (if the case involves employment disputes, or cases in which the parties had had extensive knowledge of each other). Finally, the court found it helpful to have the parties’ e-discovery vendors present when ESI issues were discussed.

The court concluded that “computer-assisted review is an available tool and should be seriously considered for use in large-data-volume cases where it may save the producing party (or both parties) significant amounts of legal fees in document review.”

It's only fair to share...Share on Facebook
Tweet about this on Twitter
Share on LinkedIn
Email this to someone
Share on StumbleUpon