Overview & Goal
Highway-rail crossings are a critical intersection of our nation's transportation infrastructure, but they also represent a significant source of safety risk. The Federal Railroad Administration (FRA) meticulously collects data on every public and private crossing in the United States, and this data holds the key to understanding and mitigating potential dangers.
The goal of this competition is to develop a model to assess and predict the highway-rail crossing accident risk. This challenge is specifically focused on the needs of stakeholders in rural areas. Your insights will help them prioritize safety investments and make data-driven decisions.
The Dataset: FRA Crossing Inventory Data (Form 71)
The primary dataset for this competition is the FRA's Crossing Inventory Data (Form 71). It is the official, authoritative repository of information concerning all U.S. highway-rail interfaces.
This is a rich dataset. For each crossing (identified by a unique U.S. DOT Grade Crossing ID), the inventory includes detailed information on:
Geographical Location: State, county, city, and classification (e.g., rural or urban).
Operating Railroad: Information on the rail carrier.
Physical Characteristics: Number of tracks, type of crossing surface, and crossing angle.
Traffic Control Devices: The type of warning devices present, from passive "crossbucks" and stop signs to active systems with flashing lights and gates.
Highway & Traffic Data: Details on the roadway, such as the number of lanes, and crucial exposure metrics like Average Annual Daily Traffic (AADT).
Rail Data: Operational details like Total Trains Per Day (TTPD) and typical train speeds.
Participants are required to research and develop a deep understanding of this dataset. You will need to consult the official data dictionaries and documentation to interpret the fields correctly.
The Problems
This competition is divided into two primary challenges that build upon one another. A successful project will excel at both.
Challenge 1: Comprehensive Data Analysis
Before you can model risk, you must understand the data. Your first task is to conduct a thorough exploratory data analysis (EDA) of the Form 71 inventory. The goal is to uncover patterns, identify key variables, and communicate your findings effectively.
Potential deliverables for this challenge include:
Statistical Analysis: What are the most common types of crossings? How do warning devices, traffic volumes, and train traffic vary by state or region?
Efficient Visualization: Create compelling charts and graphs that reveal relationships between variables.
Dashboard & Map Products: Develop an interactive dashboard or a series of maps that allow stakeholders to explore the data. For example, a map visualizing the density of passive crossings in rural areas could be a powerful tool.
Challenge 2: Risk Assessment & Prediction Model
The core of this competition is to build a model that assesses and predicts accident risk at highway-rail crossings. You will need to clearly define your "risk" metric—is it the probability of an accident, the predicted severity of an accident, or a combined index?
Your model should be designed to help our rural stakeholders answer the question: "Which crossings represent the highest risk and are the best candidates for safety improvements?"
Key requirements for this challenge include:
Feature Engineering: Thoughtfully select and engineer variables from the Form 71 data that you hypothesize will predict risk.
Model Development: You are encouraged to explore a wide range of modeling techniques. This could include traditional statistical models (like the regression models used by the FRA's official Accident Prediction System, GXAPS) as well as state-of-the-art Machine Learning (ML) or Deep Learning (DL) approaches.
Model Validation: Clearly explain how your model was trained and validated, and report on its performance using appropriate metrics.
Exploratory Challenge: Role of LLMs
As an optional, exploratory component, we also encourage teams to investigate the potential of Large Language Models (LLMs). Could an LLM be used to:
Analyze textual or categorical data within the inventory?
Generate summaries of risk factors for a specific crossing based on your model's output?
Suggest potential mitigation strategies (e.g., "This crossing's risk score is high due to high TTPD and passive warnings; recommend upgrade to active gates")?
Stakeholders
To make this challenge as realistic as possible, we are framing it around the needs of key industry leaders who solve these problems every day: ENSCO and MXV.
ENSCO is a primary technology and engineering partner for the Federal Railroad Administration (FRA). They develop cutting-edge AI and machine learning models for risk assessment and operate the FRA's advanced fleet of track inspection vehicles.
MXV (formerly TTCI) is the world-class research and testing organization for the Association of American Railroads (AAR). They are the industry's core analytics group, developing predictive models to prevent failures and enhance safety across the entire North American rail network.
Your work in this competition directly mirrors the real-world challenges these organizations face. Your goal is to develop a solution that would be genuinely valuable to their data scientists and safety engineers.
Expected Outcomes
A winning submission will be a comprehensive project that combines deep data understanding, sophisticated modeling, and a clear, practical focus on the stakeholder's needs. We are looking for solutions that are not just technically impressive, but are also actionable and useful for improving transportation safety in rural communities.