Federated Learning: Training AI Without Centralising Sensitive Data

The most valuable data for training medical AI models sits in hospital electronic health records. The obstacle is that this data is subject to strict privacy regulations and institutional policies that prevent it from being centralised in a single training dataset. Federated learning offers a solution: train the model on each hospital's data locally, share only model weight updates, and aggregate those updates into a global model — without any patient data ever leaving the hospital.

Federated learning is a distributed machine learning paradigm where the training data remains on edge devices or local servers. A central server coordinates training by distributing the current global model to participating nodes, each node trains on its local data and computes weight updates, and these updates — not the data — are transmitted back to the server for aggregation. After many rounds of this process, the global model incorporates learning from all participating data sources.

The applications extend well beyond healthcare. In financial services, banks can collaboratively train fraud detection models on each other's transaction data without sharing commercially sensitive customer records. In manufacturing, competitors can jointly train predictive maintenance models on equipment sensor data without exposing proprietary production information. On mobile devices, federated learning enables personalisation — learning user preferences from device usage patterns — without sending usage data to a central server.

The technical challenges of federated learning are real: communication overhead from transmitting model updates across potentially thousands of participants, statistical heterogeneity when participating data sources have different distributions, and system heterogeneity when participants have different computational capabilities. Google's research on federated learning, which powers the on-device language model improvements in Gboard, has produced techniques — FedAvg, differential privacy mechanisms — that address these challenges.

As India's data protection landscape evolves with the Digital Personal Data Protection Act, federated learning will become an increasingly important technique for enterprises that need AI capability but operate in data-sensitive contexts.