#8350 non-unicode filenames in hg

unreleased
closed
None
General
nobody
2020-02-13
2020-02-11
No

with a non-unicode filename this error is threown

  File "/src/forgehg/forgehg/model/hg.py", line 324, in refresh_commit_info
    fake_tree = self._tree_from_changectx(obj)
  File "/src/timermiddleware/timermiddleware/__init__.py", line 120, in wrapper
    return self.run_and_log(func, inst, *args, **kwargs)
  File "/src/timermiddleware/timermiddleware/__init__.py", line 152, in run_and_log
    retval = func(*args, **kwargs)
  File "/src/forgehg/forgehg/model/hg.py", line 453, in _tree_from_changectx
    root.set_blob(filepath, oid)
  File "/src/allura/Allura/allura/model/repository.py", line 1847, in set_blob
    path = six.ensure_text(path)
  File "/var/local/env-allura/lib/python2.7/site-packages/six.py", line 904, in ensure_text
    return s.decode(encoding, errors)
  File "/var/local/env-allura/lib64/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xca in position 5: invalid continuation byte

Discussion

  • Dave Brondsema

    Dave Brondsema - 2020-02-11
    • status: in-progress --> review
     
  • Dave Brondsema

    Dave Brondsema - 2020-02-11

    fixed on db/8350 This illustrates how it works to handle a name with a different encoding:

    >>> 'data/\xCA\xEE\xEF\xE8\xFF scene.txt'.decode('utf-8')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/var/local/env-allura/lib64/python2.7/encodings/utf_8.py", line 16, in decode
        return codecs.utf_8_decode(input, errors, True)
    UnicodeDecodeError: 'utf8' codec can't decode byte 0xca in position 5: invalid continuation byte
    >>> h.really_unicode('data/\xCA\xEE\xEF\xE8\xFF scene.txt')
    u'data/\u041a\u043e\u043f\u0438\u044f scene.txt'
    >>> print h.really_unicode('data/\xCA\xEE\xEF\xE8\xFF scene.txt')
    data/Копия scene.txt
    

    Unfortunately that only gets directory browsing working. Trying to view or diff the file raises ManifestLookupError: data/Копия scene.txt@a18ff7d3ef0d: not found in manifest because we converted the filename to unicode for mongo and web purposes, but then when requesting it from the hg repo it is encoded differently so the utf8 version of the filename is not found. I don't know how to deal with that

     
  • Kenton Taylor

    Kenton Taylor - 2020-02-13
    • status: review --> closed
     

Log in to post a comment.