simple backport script

2024-11-21 15:12:02 +00:00 · 2020-04-14 00:15:58 +03:00 · 2020-04-14 00:15:58 +03:00 · 29bb9f6665
commit 29bb9f6665
parent 1d843df1f3
4 changed files with 229 additions and 9 deletions
--- a/utils/simple-backport/README.md
+++ b/utils/simple-backport/README.md
@ -52,22 +52,56 @@ $ cat 20.1-report.tsv | cut -f1 | sort | uniq -c | sort -rn
     10 no-backport
 ```

-
 ### Как разметить пулреквест?
-По умолчанию бекпортируются все пулреквесты, у которых в описании указана категория чейнжлога Bug fix. Если этого недостаточно, используйте теги:
-* v20.1-backported -- этот пулреквест уже бекпортирован в ветку 20.1. На случай, если автоматически не определилось.
+По умолчанию бекпортируются все пулреквесты, у которых в описании указана
+категория чейнжлога Bug fix. Если этого недостаточно, используйте теги:
 * v20.1-no-backport -- в ветку 20.1 бекпортировать не нужно.
 * pr-no-backport -- ни в какие ветки бекпортировать не нужно.
-* v20.1-conflicts -- при бекпорте в 20.1 произошёл конфликт. Такие пулреквесты скрипт пропускает, к ним можно потом вернуться.
+* v20.1-conflicts -- при бекпорте в 20.1 произошёл конфликт. Такие пулреквесты
+  скрипт пропускает, к ним можно потом вернуться.
 * pr-must-backport -- нужно бекпортировать в поддерживаемые ветки.
 * v20.1-must-backport -- нужно бекпортировать в 20.1.

+### Я бекпортировал, почему скрипт не видит?
+* Сообщение коммита должно содержать текст backport/cherry-pick #12345, или
+  иметь вид стандартного гитхабовского мерж-коммита для ПР #12345.
+* Коммит должен быть достижим по `git log --first-parent my-branch`. Возможно,
+  в ветке сделали pull с merge, от чего некоторые коммиты из ветки становятся
+недоступны по `--first-parent`. 
+
+В качестве обхода, добавьте в ветку пустой коммит с текстом вроде "backport
+#12345 -- real backport commit is <sha>".

 ### Я поправил пулреквест, почему скрипт не видит?
-В процессе работы скрипт кеширует данные о пулреквестах в текущей папке, чтобы экономить квоту гитхаба. Удалите закешированные файлы, например, для всех реквестов, которые не помечены как пропущенные:
+В процессе работы скрипт кеширует данные о пулреквестах в текущей папке, чтобы
+экономить квоту гитхаба. Удалите закешированные файлы, например, для всех
+реквестов, которые не помечены как пропущенные:
 ```
 $ cat <ваша-ветка>-report.tsv | grep -v "^skip" | cut -f4
 $ cat <ваша-ветка>-report.tsv | grep -v "^skip" | cut -f4 | xargs rm
 ```

+## Как сформировать change log
+В этой же папке запустите:
+```
+$ time GITHUB_TOKEN=... ./changelog.sh v20.3.4.10-stable v20.3.5.21-stable
+9 PRs added between v20.3.4.10-stable and v20.3.5.21-stable.
+### ClickHouse release v20.3.5.21-stable FIXME as compared to v20.3.4.10-stable

+#### Bug Fix
+
+* Fix 'Different expressions with the same alias' error when query has PREWHERE
+  and WHERE on distributed table and `SET distributed_product_mode = 'local'`.
+[#9871](https://github.com/ClickHouse/ClickHouse/pull/9871) ([Artem
+Zuikov](https://github.com/4ertus2)).
+...
+```
+
+Скрипт выведет changelog на экран, а также сохранит его в `./changelog.md`.
+Скопируйте этот текст в большой changelog, проверьте и поправьте версию и дату
+релиза, вычитайте сообщения. Если сообщения неправильные, обязательно исправьте
+их на гитхабе -- это поможет при последующей генерации changelog для других
+версий, содержащих этот пулреквест. Чтобы скрипт подтянул изменения с гитхаба,
+удалите соответствующие файлы `./pr12345.json`. Если вы часто видите
+неправильно оформленные пулреквесты, это повод подумать об улучшении проверки
+Description check в CI.
--- a/utils/simple-backport/backport.sh
+++ b/utils/simple-backport/backport.sh
@ -10,10 +10,13 @@ merge_base=$(git merge-base origin/master "origin/$branch")
 git log "$merge_base..origin/master" --first-parent > master-log.txt
 git log "$merge_base..origin/$branch" --first-parent > "$branch-log.txt"

+# NOTE keep in sync with ./changelog.sh.
 # Search for PR numbers in commit messages. First variant is normal merge, and second
-# variant is squashed.
+# variant is squashed. Next are some backport message variants.
 find_prs=(sed -n "s/^.*Merge pull request #\([[:digit:]]\+\).*$/\1/p;
-                  s/^.*(#\([[:digit:]]\+\))$/\1/p")
+                  s/^.*(#\([[:digit:]]\+\))$/\1/p;
+                  s/^.*back[- ]*port[ ]*#\([[:digit:]]\+\).*$/\1/Ip;
+                  s/^.*cherry[- ]*pick[ ]*#\([[:digit:]]\+\).*$/\1/Ip")

 "${find_prs[@]}" master-log.txt | sort -rn > master-prs.txt
 "${find_prs[@]}" "$branch-log.txt" | sort -rn > "$branch-prs.txt"
@ -39,7 +42,7 @@ do
            rm "$file"
            break
        fi
-        sleep 0.5
+        sleep 0.1
    fi

    if ! [ "$pr" == "$(jq -r .number "$file")" ]
@ -61,7 +64,12 @@ do
    if echo "$labels" | grep -x "pr-must-backport\|v$branch-must-backport" > /dev/null; then action="backport"; fi
    if echo "$labels" | grep -x "v$branch-conflicts" > /dev/null;                       then action="conflict"; fi
    if echo "$labels" | grep -x "pr-no-backport\|v$branch-no-backport" > /dev/null;     then action="no-backport"; fi
-    if echo "$labels" | grep -x "v$branch\|v$branch-backported" > /dev/null;            then action="done"; fi
+    # FIXME Ignore "backported" labels for now. If we can't find the backport commit,
+    # this means that the changelog script also won't be able to. An alternative
+    # way to mark PR as backported is to add an empty commit with text like
+    # "backported #12345", so that it can be found between tags and put in proper
+    # place in changelog.
+    #if echo "$labels" | grep -x "v$branch\|v$branch-backported" > /dev/null;            then action="done"; fi

    # Find merge commit SHA for convenience
    merge_sha="$(jq -r .merge_commit_sha "$file")"
--- a/utils/simple-backport/changelog.sh
+++ b/utils/simple-backport/changelog.sh
@ -0,0 +1,69 @@
+#!/bin/bash
+set -e
+
+from="$1"
+to="$2"
+
+git log "$from..$to" --first-parent > "changelog-log.txt"
+
+# NOTE keep in sync with ./backport.sh.
+# Search for PR numbers in commit messages. First variant is normal merge, and second
+# variant is squashed. Next are some backport message variants.
+find_prs=(sed -n "s/^.*Merge pull request #\([[:digit:]]\+\).*$/\1/p;
+                  s/^.*(#\([[:digit:]]\+\))$/\1/p;
+                  s/^.*back[- ]*port[ ]*#\([[:digit:]]\+\).*$/\1/Ip;
+                  s/^.*cherry[- ]*pick[ ]*#\([[:digit:]]\+\).*$/\1/Ip")
+
+"${find_prs[@]}" "changelog-log.txt" | sort -rn > "changelog-prs.txt"
+
+
+echo "$(wc -l < "changelog-prs.txt") PRs added between $from and $to."
+
+function github_download()
+{
+    local url=${1}
+    local file=${2}
+    if ! [ -f "$file" ]
+    then
+        if ! curl -H "Authorization: token $GITHUB_TOKEN" \
+                -sSf "$url" \
+                > "$file"
+        then
+            >&2 echo "Failed to download '$url' to '$file'. Contents: '"
+            >&2 cat "$file"
+            >&2 echo "'."
+            rm "$file"
+            return 1
+        fi
+        sleep 0.1
+    fi
+}
+
+for pr in $(cat "changelog-prs.txt")
+do
+    # Download PR info from github.
+    file="pr$pr.json"
+    github_download "https://api.github.com/repos/ClickHouse/ClickHouse/pulls/$pr" "$file" || continue
+
+    if ! [ "$pr" == "$(jq -r .number "$file")" ]
+    then
+        >&2 echo "Got wrong data for PR #$pr (please check and remove '$file')."
+        continue
+    fi
+
+    # Download author info from github.
+    user_id=$(jq -r .user.id "$file")
+    user_file="user$user_id.json"
+    github_download "$(jq -r .user.url "$file")" "$user_file" || continue
+
+    if ! [ "$user_id" == "$(jq -r .id "$user_file")" ]
+    then
+        >&2 echo "Got wrong data for user #$user_id (please check and remove '$user_file')."
+        continue
+    fi
+done
+
+echo "### ClickHouse release $to FIXME as compared to $from
+" > changelog.md
+./format-changelog.py changelog-prs.txt >> changelog.md
+cat changelog.md
--- a/utils/simple-backport/format-changelog.py
+++ b/utils/simple-backport/format-changelog.py
@ -0,0 +1,109 @@
+#!/usr/bin/python3
+
+import os
+import sys
+import itertools
+import argparse
+import json
+import collections
+import re
+
+parser = argparse.ArgumentParser(description='Format changelog for given PRs.')
+parser.add_argument('file', metavar='FILE', type=argparse.FileType('r', encoding='utf-8'), nargs=1, default=sys.stdin, help='File with PR numbers, one per line.')
+args = parser.parse_args()
+
+# This function mirrors the PR description checks in ClickhousePullRequestTrigger.
+# Returns False if the PR should not be mentioned changelog.
+def parse_one_pull_request(item):
+    description = item['body']
+    # Don't skip empty lines because they delimit parts of description
+    lines = [line for line in map(lambda x: x.strip(), description.split('\n') if description else [])]
+    lines = [re.sub(r'\s+', ' ', l) for l in lines]
+
+    category = ''
+    entry = ''
+
+    if lines:
+        i = 0
+        while i < len(lines):
+            if re.match(r'(?i).*category.*:$', lines[i]):
+                i += 1
+                if i >= len(lines):
+                    break
+                category = re.sub(r'^[-*\s]*', '', lines[i])
+                i += 1
+            elif re.match(r'(?i)^\**\s*(Short description|Change\s*log entry)', lines[i]):
+                i += 1
+                # Can have one empty line between header and the entry itself. Filter it out.
+                if i < len(lines) and not lines[i]:
+                    i += 1
+                # All following lines until empty one are the changelog entry.
+                entry_lines = []
+                while i < len(lines) and lines[i]:
+                    entry_lines.append(lines[i])
+                    i += 1
+                entry = ' '.join(entry_lines)
+            else:
+                i += 1
+
+    if not category:
+        # Shouldn't happen, because description check in CI should catch such PRs.
+        # Fall through, so that it shows up in output and the user can fix it.
+        category = "NO CL CATEGORY"
+
+    # Filter out the PR categories that are not for changelog.
+    if re.match(r'(?i)doc|((non|in|not|un)[-\s]*significant)', category):
+        return False
+
+    if not entry:
+        # Shouldn't happen, because description check in CI should catch such PRs.
+        category = "NO CL ENTRY"
+        entry = "NO CL ENTRY:  '" + item['title'] + "'"
+
+    entry = entry.strip()
+    if entry[-1] != '.':
+        entry += '.'
+
+    item['entry'] = entry
+    item['category'] = category
+
+    return True
+
+
+category_to_pr = collections.defaultdict(lambda: [])
+users = {}
+for line in args.file[0]:
+    pr = json.loads(open(f'pr{line.strip()}.json').read())
+    assert(pr['number'])
+    if not parse_one_pull_request(pr):
+        continue
+
+    assert(pr['category'])
+    category_to_pr[pr['category']].append(pr)
+    user_id = pr['user']['id']
+    users[user_id] = json.loads(open(f'user{user_id}.json').read())
+
+def print_category(category):
+    print("#### " + category)
+    print()
+    for pr in category_to_pr[category]:
+        user = users[pr["user"]["id"]]
+        user_name = user["name"] if user["name"] else user["login"]
+
+        # Substitute issue links
+        pr["entry"] = re.sub(r'#([0-9]{4,})', r'[#\1](https://github.com/ClickHouse/ClickHouse/issues/\1)', pr["entry"])
+
+        print(f'* {pr["entry"]} [#{pr["number"]}]({pr["html_url"]}) ([{user_name}]({user["html_url"]})).')
+
+    print()
+
+# Print categories in preferred order
+categories_preferred_order = ['Backward Incompatible Change', 'New Feature', 'Bug Fix', 'Improvement', 'Performance Improvement', 'Build/Testing/Packaging Improvement', 'Other']
+for category in categories_preferred_order:
+    if category in category_to_pr:
+        print_category(category)
+        category_to_pr.pop(category)
+
+# Print the rest of the categories
+for category in category_to_pr:
+    print_category(category)