Automation Tools Help Railinc Manage Big Data

2017-Reference Files Help Keep Industry Moving_

December 7, 2017

At Railinc, we handle a large volume of rail data every day and use automation tools to support our work. Control-M helps us monitor complex batch processes across multiple platforms and applications and plays a vital role in what we do. Robert Redd, a release engineer administrator at Railinc, wrote the blog post below about how we are using Control-M and big data to help build a smarter rail network. It was originally posted on BMC Software’s website. BMC Software also recently published a case study on Railinc’s use of Control-M.

Established in 1999 to provide IT services for the Association of American Railroads (AAR), Railinc is the railroad industry’s most innovative and reliable resource for IT and information services. Today, as a wholly-owned subsidiary of the AAR, Railinc supports business processes and provides business intelligence that helps the freight rail industry increase productivity, enhance operational efﬁciency, and increase their return on investment in assets.

Rail industry participants recognize the opportunities big data presents. In response to the critical need for actionable data, we are working with Class I, short line, and regional railroads as well as other Railinc customers to capture huge volumes of data across diverse points in the rail network and help our customers:

Track shipments across the North American freight rail network
Achieve efficiencies around railcar repairs, car hire and other rail operations
Monitor the health of equipment to ensure the safe movement of freight
Better manage traffic to keep railcars moving

Railinc is the largest, single source of real-time, accurate, interline rail data for the North American railroad system. That data is empowering our customers to drive efficiency, manage costs, and improve the health of the North American rail network.

To continue meeting our customers’ data needs, we’ve embraced big data and replaced a previous proprietary data warehouse with an open-source environment that offers greater flexibility at a lower cost. Two years ago, we began the move to Hortonworks Hadoop as the framework for storing, processing, and managing the massive volumes of data we handle today and support even greater data volumes in the future.

Control-M is a vital part of our big-data strategy. Railinc has used Control-M for 11 years to schedule and monitor complex batch processes across multiple platforms and applications. Control-M for Hadoop allows us to develop, schedule, and monitor Hadoop batch processes using the same familiar interface we use for our other workloads.

Big Data Brings Unprecedented Visibility

The North American rail network is growing increasingly smarter as railroads implement advanced technology such as intelligent sensors positioned alongside tracks. These sensors provide data such as location and movement information that helps customers manage their fleets, track their equipment, view ETAs, efficiently coordinate the movement of millions of railcars and time the delivery of cargo down to the hour. Still, other detectors monitor the physical condition of rolling stock, enabling railroads and car owners to detect issues such as a bad brake or a wheel with a flat spot. These data enable Railinc to provide advanced warning to schedule repairs before a minor issue becomes a costly repair.

The volume is staggering. We’re capturing data from more than 40,000 locomotives and 1.6 million railcars traveling across 140,000 miles of track. The data come from equipment belonging to 1,700 different rail car owners, 560+ local and regional railroads, and seven Class I railroads. Our data warehouse already contains 50 terabytes of data from disparate sources and we expect that volume to increase nearly 100% over the next few years.

Railinc industry applications leverage these data to enable customers to operate more efficiently and economically. Our car hire applications support activities around the fees charged and paid for the usage of rail equipment, enabling higher equipment utilization and improving payment accuracy. Traffic management applications such as our Clear Path™ System support the movement of trains through the Chicago Terminal, the busiest rail gateway in North America.

Control-M Helps Keep Big Data Flowing

To support our industrial applications, we must gather huge volumes of data every day from many sources, move it through various systems for analysis and translation into actionable information, and generate and distribute reports to our customers. The workflows that get the data where it needs to be when it needs to be there are highly complex with numerous dependencies.

That’s where Control-M comes in. It simplifies the creation of even the most complex workloads. Using the graphical interface, I can literally draw the dependencies among jobs, so I can ensure that prerequisite processes in a sequence are completed before the next process in the sequence is started.

Perhaps the most significant benefit is that the solution isn’t tied to a single technology. When we added Hadoop, Control-M was a natural fit, giving us the same kind of visibility into and control of Hadoop jobs that we experience on other platforms. The scheduling staff didn’t have to learn a special scheduling tool for Hadoop. We use the same interface to schedule workloads on all of our platforms and we have full visibility into the hundreds of jobs that run every night.

Control-M Batch Impact Manager lets us monitor Hadoop jobs without having people sitting in front of a console 24/7. If the solution detects potential delays or failures, it alerts us immediately. Our customers rely on us to meet those SLAs. Delayed reports on the Chicago Terminal, for example, could affect rail operations not only in the Chicago area, but throughout North America. Batch Impact Manager gives us an intelligent, proactive approach to keeping processes—and trains—running.

Another plus for Control-M is that it supports multisite environments. The critical nature of our big-data environment makes it important to have a second site for disaster recovery purposes. So we created a primary and secondary site for our Hadoop environment.

However, we aren’t limiting the second site to DR activities. We need the flexibility to do any job on either site. Control-M can easily talk to both sites, enabling us to ETL to either site, manipulate data, create views, and move processes back and forth between sites as necessary and ultimately replicate IT services and data on both sites. In the future this multisite capability will help with load balancing, enabling us to keep up with the increasing demand for big-data reports.

Conclusion

The transition to Hadoop has added significantly to the number of jobs we run and Hadoop workflows now account for about one third of all our batch processes. The orchestration that Control-M performs to get data where it needs to be at the time it needs to be there is critical to the success of our big-data efforts.

We simply couldn’t do the job without Control-M.

—Robert Redd

As release engineer administrator, Robert Redd manages systems that ensure Railinc applications are working for customers and that support the work the company’s developers do to create innovative solutions for the freight rail industry. Redd, who joined Railinc in 2000, is a graduate of the U.S. Military Academy and a U.S. Army veteran.