This paper was accepted to the workshop on Distribution Shifts in NeurIPS 2023.
Massive-scale coaching of fashions has turn out to be exceedingly dearer. In an ever altering world the place Petabytes of recent information is generated every single day, we wish to have the ability to regularly prepare fashions. On this paper, we create a benchmark for continuous large-scale coaching of CLIP fashions the place the information distribution varies solely by time. In contrast with conventional continuous studying literature, there isn’t any arduous separation of duties, i.e., we assume an infinite stream of knowledge in a canonical format arrives that displays pure distribution shifts as time passes. We create a number of such benchmarks for CLIP coaching based mostly on customary benchmarks comparable to DataComp and YFCC15M. We suggest varied evaluations and exhibit that fashions educated on information as much as a sure yr will lose efficiency on sure classes of quickly altering information. We suggest easy studying price schedules, and coaching with replay buffers to scale back the hole in ahead switch. We exhibit {that a} easy baseline that continues coaching from the final checkpoint and replays previous information could be aggressive with an Oracle that will get all information so far in a single cross and trains with a big price range.