Prophet 异常值
Prophet Outliers
There are two main ways that outliers can affect Prophet forecasts. Here we make a forecast on the logged Wikipedia visits to the R page from before, but with a block of bad data:
异常值主要通过两种方式影响 Prophet 预测。这里,我们根据之前记录的维基百科对 R 页面的访问情况进行预测,但使用了一组不良数据:
1# R
2df <- read.csv('https://raw.githubusercontent.com/facebook/prophet/main/examples/example_wp_log_R_outliers1.csv')
3m <- prophet(df)
4future <- make_future_dataframe(m, periods = 1096)
5forecast <- predict(m, future)
6plot(m, forecast)
1# Python
2df = pd.read_csv('https://raw.githubusercontent.com/facebook/prophet/main/examples/example_wp_log_R_outliers1.csv')
3m = Prophet()
4m.fit(df)
5future = m.make_future_dataframe(periods=1096)
6forecast = m.predict(future)
7fig = m.plot(forecast)
The trend forecast seems reasonable, but the uncertainty intervals seem way too wide. Prophet is able to handle the outliers in the history, but only by fitting them with trend changes. The uncertainty model then expects future trend changes of similar magnitude.
The best way to handle outliers is to remove them - Prophet has no problem with missing data. If you set their values to NA
in the history but leave the dates in future
, then Prophet will give you a prediction for their values.
趋势预测似乎合理,但不确定性区间似乎太宽。Prophet 能够处理历史中的异常值,但只能通过将其与趋势变化相拟合。然后,不确定性模型会预期未来趋势变化的幅度相似。
处理异常值的最佳方法是将其移除 - Prophet 不会处理缺失数据。如果您NA
在历史记录中将其值设置为 但将日期保留为future
,那么 Prophet 将为您提供其值的预测。
1# R
2outliers <- (as.Date(df$ds) > as.Date('2010-01-01')
3 & as.Date(df$ds) < as.Date('2011-01-01'))
4df$y[outliers] = NA
5m <- prophet(df)
6forecast <- predict(m, future)
7plot(m, forecast)
1# Python
2df.loc[(df['ds'] > '2010-01-01') & (df['ds'] < '2011-01-01'), 'y'] = None
3model = Prophet().fit(df)
4fig = model.plot(model.predict(future))
In the above example the outliers messed up the uncertainty estimation but did not impact the main forecast yhat
. This isn’t always the case, as in this example with added outliers:
在上面的例子中,异常值扰乱了不确定性估计,但并未影响主要预测yhat
。情况并非总是如此,就像在这个添加了异常值的例子中所展示的那样:
1# R
2df <- read.csv('https://raw.githubusercontent.com/facebook/prophet/main/examples/example_wp_log_R_outliers2.csv')
3m <- prophet(df)
4future <- make_future_dataframe(m, periods = 1096)
5forecast <- predict(m, future)
6plot(m, forecast)
1# Python
2df = pd.read_csv('https://raw.githubusercontent.com/facebook/prophet/main/examples/example_wp_log_R_outliers2.csv')
3m = Prophet()
4m.fit(df)
5future = m.make_future_dataframe(periods=1096)
6forecast = m.predict(future)
7fig = m.plot(forecast)
Here a group of extreme outliers in June 2015 mess up the seasonality estimate, so their effect reverberates into the future forever. Again the right approach is to remove them:
2015 年 6 月的一组极端异常值扰乱了季节性估计,因此它们的影响将永远回荡到未来。正确的方法是再次将其移除:
1# R
2outliers <- (as.Date(df$ds) > as.Date('2015-06-01')
3 & as.Date(df$ds) < as.Date('2015-06-30'))
4df$y[outliers] = NA
5m <- prophet(df)
6forecast <- predict(m, future)
7plot(m, forecast)
1# Python
2df.loc[(df['ds'] > '2015-06-01') & (df['ds'] < '2015-06-30'), 'y'] = None
3m = Prophet().fit(df)
4fig = m.plot(m.predict(future))