After ChatGPT and other large language models were released, people online started raising concerns about data privacy. Some even suggested stopping all new data from being used to train these models — but that would seriously limit what AI can do in the future.

Luckily, there’s a better way. AI can still be trained without using personal data. This method is called federated learning, and in this article, we’ll explore what it is and how it could shape the future of AI training.

Routing Model Updates Through Proxy Servers for Extra Privacy and Speed

Federated learning involves devices training a shared model on their own data and then sending model updates (like learned parameters or gradients) to be aggregated into a global model. One emerging enhancement is to route these model updates through proxy servers or intermediary nodes instead of directly to the central aggregator. This added indirection can significantly bolster privacy by anonymizing which device contributed which update, effectively ensuring client anonymity.

Using a proxy server can also improve the speed and efficiency of federated learning. For instance, a proxy can gather and compress updates from many devices before forwarding them, cutting down on communication overhead. Research on “ProxyFL” – a proxy-based federated learning scheme – found that communicating via proxies not only strengthened privacy (especially when combined with techniques like differential privacy) but also reduced bandwidth usage and latency in training. 

By offloading some coordination duties to proxy nodes (which might be located closer to client devices or handle partial aggregations), federated networks can train faster and scale to more participants. The main idea is that sending updates through trusted proxies adds extra protection for user data and makes learning more efficient — all without raw data ever leaving the users’ devices.

Federated Learning vs. Centralized Learning: How It Works

In the traditional method, all the training data is sent to one central server or cloud, where the model is trained. This means users or organizations have to transfer their original data to one place, which can obviously cause privacy issues and other problems. 

Federated learning flips that paradigm. Instead of bringing data to the model, FL brings the model to the data. The initial model (often just a starting point with random or pre-trained weights) is sent out from the server to many devices (phones, IoT gadgets, or organizational servers). The server then aggregates these updates (e.g. by averaging them) to improve the global model, which can be sent back to devices in iterative rounds. Crucially, your personal data stays on your device – the server only sees the learned patterns, which are usually far less sensitive than the raw data.

This decentralized training approach was first introduced by Google researchers in 2016, originally to train smartphone models like the Android keyboard predictor. In fact, Google and Apple have famously used federated learning to train keystroke prediction models on millions of smartphones without ever collecting users’ actual typing data. By training on-device, Google’s Gboard and Apple’s QuickType keyboards can improve suggestions using what people type, while the sensitive text messages never leave the phones. 

Apple has extended the idea to other areas as well – for example, Apple’s Health app uses federated learning to analyze usage patterns of health data types across users’ devices while maintaining each user’s privacy. This means the Health app can learn which health metrics or exercises are most popular or effective, collectively, without Apple seeing any individual’s personal health logs.

Beyond consumer apps, federated learning is transforming how organizations collaborate on AI. A notable case study comes from the healthcare sector: Intel and the University of Pennsylvania led a federated learning pilot involving 29 medical institutes around the world to jointly train an AI model for brain tumor detection. None of the hospitals shared their patient data with each other or with Intel; instead, each hospital trained the model on its own MRI scans and then shared model updates. 

The combined model learned from a much larger, diverse dataset than any single hospital could provide. In other words, by using FL the researchers nearly matched the performance of traditional training while adhering to patient privacy constraints. So, federated learning provided a way to access vastly more data (spread across hospitals) while protecting the security of that data – a win-win for both model performance and privacy.


Discover more from Being Shivam

Subscribe to get the latest posts sent to your email.