Wikipedia:Bots/Requests for approval/PrimeBOT 17
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was
Approved.
Operator: Primefac (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 14:24, Saturday, May 27, 2017 (UTC)
Automatic, Supervised, or Manual: automatic
Programming language(s): AWB
Source code available: AWB
Function overview: Remove UTM parameters (Google analytics) from external links and references (i.e. resurrect Theo's Little Bot task #23)
Links to relevant discussions (where appropriate): Wikipedia:Bot requests/Archive 55#Remove Google Analytics tracking from external links
Edit period(s): Once a month
Estimated number of pages affected: 16000 in the initial run, and maybe 200 a month after that? Theo's task ran in batches of 500, which also works, but I couldn't then give a timeframe.
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): Yes
Function details: Straight-forward find-and-remove. Regex:
\??(?:&?utm_[^=]*?=[^&\s\]\|]*)+(?=]|\s|\|)|(?<=\?)(?:&?utm_[^=]*?=[^&\s\]\|]*)+&
(test cases)\??(?:&?utm_[^=\s]*?=[^&\s\]\|]*?)+(?=<|}|]|\s|\|)|(?<=\?)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&|(?<=&)(?:&?utm_[^=\s]*?=[^&\s\]\|]*)+&
(tests)
As near as I can tell, I've managed to cover all of the edge cases which were of concern in the original BRFA. The blue section covers the case where ?utm_ is followed by an & not followed by another utm_ (e.g. ?utm_example=1234¶=value
). The red hits everything else (i.e. where the utm_ term(s) are only at the end of the URL).