We evaluate the performance of Twitter-based influenza surveillance in ten English-speaking countries across four continents. We find that tweets are positively correlated with existing surveillance data provided by government agencies in these countries, with r values ranging from .37–.81. We show that incorporating Twitter data into a strong autoregressive baseline reduces mean squared error in 80 to 100 percent of locations depending on the lag, with larger improvements when reporting delays are longer.