Inside JP Morgan's Data Environment (2024)

JP Morgan Chase is one of the largest banks in the world, with nearly $130 billion in revenue in 2020. With 50,000 IT employees and an annual IT budget of $12 billion a year, the company invests heavily to ensure its technology gives it a competitive advantage. An example is JPMorgan Chase's data infrastructure, which includes a whopping 450+ Petabytes of data serving more than 6,500 applications, according to a presentation at AWS re:Inveny 2021, including one that processes 3 billion messages a day.

The bank recognizes the importance of data and widely shares it internally. Yet in a highly-regulated industry such as banking, making data too accessible can also lead to disaster.

“To unlock the value of our data, we must solve this paradox,” wrote JPMorgan officials in a 2021 blog on Amazon’s AWS site. “We must make data easy to share across the organization, while maintaining appropriate control over it.”

Like any large enterprise, JPMorgan had a lot of stored data in relational databases. As an early big data proponent, JPMorgan had also adopted Hadoop widely, which it used to build a monolithic on-premises data lake managed by a central data engineering team. While Hadoop continues to play a key role for analytics at JPMorgan, the bank also recognized how embracing the public cloud could decentralize data ownership and encourage data democratization and business innovation.

JPMorgan Chase first created a comprehensive data structure that is based around the concept of “data products”. These are collections of related data that may or may not map to existing business lines or even IT systems. For instance, one JPMorgan Chase data product includes all the data around wholesale credit risk, such as credit exposure, credit rating, and credit facility harvested from many different data stores and applications. Another data product is focused on trading and position data, including cash, derivatives, securities and collateral. Using the term “data product” instead of dataset or repository or even data asset is meant to create a shift in mindset by highlighting the goal: enabling data to produce business results, rather than accumulate dust in some forgotten database, according to James Reid, JPMorgan CIO for Employee Experience and Corporate Technology, in a July 2021 presentation

Inside JP Morgan's Data Environment (1)

Each data product is curated and owned by a team that includes a business owner, a technical owner, and multiple data engineers. They own and deeply understand their specific data product, its uses, its limitations and its management requirements. At the same time, giving each data engineering team end-to-end ownership of a domain encouraged and empowered them to consolidate any “data puddles” and “data ponds” under their management that feed a JP Morgan Chase data lake, said Reid.

Each data product is stored in its own physically-isolated data lake. While most are stored on Amazon S3, there are some still stored in on-premises repositories due to regulatory realities, said Reid.

All of these data lakes are cataloged by AWS Glue, Amazon’s serverless data integration tool. In addition, there are consuming applications used by employees that are physically separated from each other as well as from the data lakes. These separate, but interconnected, domains create JPMorgan’s data mesh.

Amazon AWS cloud services interconnect the distributed domains. AWS Glue Data Catalog enables applications and users to find and query the data they need. This enterprise-wide data catalog is automatically updated as new data is ingested into the data lakes, checked for data quality, and curated by data engineers with domain expertise.

Inside JP Morgan's Data Environment (2)

The catalog also tracks all data requests and audits that flow from data to applications. This gives JPMorgan Chase data engineers a single point of visibility into how their data is being used, which is key for JPMorgan Chase to remain compliant with the many regulations it faces. This metadata also helps users looking for data they are entitled to use that is both relevant and trustworthy.

Meanwhile, AWS Lake Formation enables data to be securely shared to approved applications and users. Neither applications nor users are ever allowed to copy or store data. This reduces storage costs and prevents the creation of “dark” data silos that lose freshness and accuracy over time, creating data quality and security problems. And without extra copies of data floating around, it's easier to manage data and enforce policies and access controls.

Inside JP Morgan's Data Environment (3)

Finally, JPMorgan Chase uses a trio of cloud-based database engines to query the data, which includes Amazon Athena, Amazon Redshift Spectrum, and Amazon EMR for non-SQL data processing. Machine learning is done via Amazon Sagemaker.

For JPMorgan Chase, its Amazon cloud-based data mesh satisfies three key technical priorities: high security, high availability, and easy discoverability. And that is supporting the outcomes JPMorgan hopes to achieve with its data: cost savings, business value, and data reuse.

With a framework for instantiating data lakes that uses a data mesh architecture, JP Morgan Chase was able to share data across the enterprise while giving data owners the control and visibility they need to manage their data effectively.

Get a demo of the Acceldata Data Observability Platform to learn how your organization can optimize data spend, data operations, and data reliability.

Photo by Jaanam Haleem on Unsplash

Inside JP Morgan's Data Environment (2024)
Top Articles
Asset Classes Explained | The Motley Fool
What are mobile roaming and data roaming charges? - Uswitch
Kostner Wingback Bed
DPhil Research - List of thesis titles
Craigslist Pets Longview Tx
Inducement Small Bribe
Celebrity Extra
Fototour verlassener Fliegerhorst Schönwald [Lost Place Brandenburg]
Volstate Portal
Jasmine
Stream UFC Videos on Watch ESPN - ESPN
Mid90S Common Sense Media
W303 Tarkov
New Mexico Craigslist Cars And Trucks - By Owner
George The Animal Steele Gif
Directions To O'reilly's Near Me
My.tcctrack
Clear Fork Progress Book
Candy Land Santa Ana
Leccion 4 Lesson Test
Mychart Anmed Health Login
Greenville Sc Greyhound
Sam's Club Gas Price Hilliard
Devotion Showtimes Near Regency Buenaventura 6
Chicago Based Pizza Chain Familiarly
Ltg Speech Copy Paste
Mdt Bus Tracker 27
Rgb Bird Flop
Visit the UK as a Standard Visitor
1964 Impala For Sale Craigslist
Best Restaurants Ventnor
Ryujinx Firmware 15
Angela Muto Ronnie's Mom
Deleted app while troubleshooting recent outage, can I get my devices back?
Song That Goes Yeah Yeah Yeah Yeah Sounds Like Mgmt
Carespot Ocoee Photos
Metro By T Mobile Sign In
Foolproof Module 6 Test Answers
Indiana Jones 5 Showtimes Near Cinemark Stroud Mall And Xd
Improving curriculum alignment and achieving learning goals by making the curriculum visible | Semantic Scholar
2007 Jaguar XK Low Miles for sale - Palm Desert, CA - craigslist
Citibank Branch Locations In North Carolina
Www Craigslist Com Atlanta Ga
Comanche Or Crow Crossword Clue
Best Suv In 2010
Paradise leaked: An analysis of offshore data leaks
Elvis Costello announces King Of America & Other Realms
Gameplay Clarkston
Cheryl Mchenry Retirement
Equinox Great Neck Class Schedule
Latest Posts
Article information

Author: Fredrick Kertzmann

Last Updated:

Views: 6417

Rating: 4.6 / 5 (46 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Fredrick Kertzmann

Birthday: 2000-04-29

Address: Apt. 203 613 Huels Gateway, Ralphtown, LA 40204

Phone: +2135150832870

Job: Regional Design Producer

Hobby: Nordic skating, Lacemaking, Mountain biking, Rowing, Gardening, Water sports, role-playing games

Introduction: My name is Fredrick Kertzmann, I am a gleaming, encouraging, inexpensive, thankful, tender, quaint, precious person who loves writing and wants to share my knowledge and understanding with you.