Overview & Goal


Highway-rail crossings are a critical intersection of our nation's transportation infrastructure, but they also represent a significant source of safety risk. The Federal Railroad Administration (FRA) meticulously collects data on every public and private crossing in the United States, and this data holds the key to understanding and mitigating potential dangers.


The goal of this competition is to develop a model to assess and predict the highway-rail crossing accident risk. This challenge is specifically focused on the needs of stakeholders in rural areas. Your insights will help them prioritize safety investments and make data-driven decisions.


The Dataset: FRA Crossing Inventory Data (Form 71)

The primary dataset for this competition is the FRA's Crossing Inventory Data (Form 71). It is the official, authoritative repository of information concerning all U.S. highway-rail interfaces.


This is a rich dataset. For each crossing (identified by a unique U.S. DOT Grade Crossing ID), the inventory includes detailed information on:


Participants are required to research and develop a deep understanding of this dataset. You will need to consult the official data dictionaries and documentation to interpret the fields correctly.


The Problems


This competition is divided into two primary challenges that build upon one another. A successful project will excel at both.


Challenge 1: Comprehensive Data Analysis


Before you can model risk, you must understand the data. Your first task is to conduct a thorough exploratory data analysis (EDA) of the Form 71 inventory. The goal is to uncover patterns, identify key variables, and communicate your findings effectively.


Potential deliverables for this challenge include:


Challenge 2: Risk Assessment & Prediction Model

The core of this competition is to build a model that assesses and predicts accident risk at highway-rail crossings. You will need to clearly define your "risk" metric—is it the probability of an accident, the predicted severity of an accident, or a combined index?


Your model should be designed to help our rural stakeholders answer the question: "Which crossings represent the highest risk and are the best candidates for safety improvements?"


Key requirements for this challenge include:


Exploratory Challenge: Role of LLMs

As an optional, exploratory component, we also encourage teams to investigate the potential of Large Language Models (LLMs). Could an LLM be used to:


Stakeholders

To make this challenge as realistic as possible, we are framing it around the needs of key industry leaders who solve these problems every day: ENSCO and MXV.


Your work in this competition directly mirrors the real-world challenges these organizations face. Your goal is to develop a solution that would be genuinely valuable to their data scientists and safety engineers.


Expected Outcomes

A winning submission will be a comprehensive project that combines deep data understanding, sophisticated modeling, and a clear, practical focus on the stakeholder's needs. We are looking for solutions that are not just technically impressive, but are also actionable and useful for improving transportation safety in rural communities.