#7900 Issue with encoding of filenames in git snapshot zip

unreleased
invalid
sf-1 (616)
General
nobody
2015-06-29
2015-06-18
No

If a git repo has a filename with unicode characters, and you download a snapshot zip file, then unzip it the filename will probably be wrong.

Example: "Download Snapshot" link on https://sourceforge.net/p/compilibre/code/ci/master/tree/ and then "COMPILIBRE-source/COMPILIBRE/installer/Cliquez ici pour démarrer COMPILIBRE.bat" can end up as "Cliquez ici pour d+?marrer COMPILIBRE.bat" or "Cliquez ici pour démarrer COMPILIBRE.bat"

I wonder how easy this is to fix? Does it matter what your git settings are, or how the file was added to the repo? Or your local filesystem?

Original report: https://sourceforge.net/p/forge/feature-requests/353/

Discussion

  • Heith Seewald - 2015-06-18

    So you're saying this example will work some of the time?

     
    • Dave Brondsema

      Dave Brondsema - 2015-06-18

      This example works all the time. (Although extracting on OSX ends up ok, whereas Linux has wrong characters)

      I used the word "probably" because I don't know what triggers the issue and it might be dependent on how the file and repo are set up.

       
  • Heith Seewald - 2015-06-18

    Ok, that is strange. I asked because I extracted it on OSX and worked correctly.

     
  • Igor Bondarenko - 2015-06-19

    I'm getting "Cliquez ici pour d├йmarrer COMPILIBRE.bat" on Linux

     
  • Heith Seewald - 2015-06-19

    I had the same issue in linux using unizp -- but 7zip properly detected the unicode names.

    We might be able to address it server side to some degree with something like: zip archive dir -r -UN=UTF8

    Or possibly upgrade the version of zip on the server to 3.0+.

    ...zip 3.0, in addition to the standard file path, now includes the UTF-8 translation of the path if the entry path is not entirely 7-bit ASCII. When an entry is missing the Unicode path, zip reverts back to the standard file path.

     
    • Dave Brondsema

      Dave Brondsema - 2015-06-19

      I think we create the zip files for git with git archive not the zip command. Quick googling shows a long thread about the topic: http://git.661346.n2.nabble.com/git-archive-format-zip-utf-8-issues-td7564798.html I didn't read it all, I'll let whoever takes this ticket figure out what the results are. It sounds like they made some improvements during the course of that thread so versions of git after that date might be better than others.

       
      • Dave Brondsema

        Dave Brondsema - 2015-06-19

        And the example referenced (on sourceforge.net) is using git 1.7.4.1 which is fairly old.

        Work on this ticket should try to reproduce in that version of git, and then see if it's better with a new version of git.

         
  • Dave Brondsema

    Dave Brondsema - 2015-06-19

    Looks like it does depend on both the version of git on the server, and the version of unzip on the client.

    • Git 1.7.4.1, unzip 6.0.0: d├йmarrer.txt
    • Git 2.1.3, unzip 5.5.2: d+?marrer.txt
    • Git 2.1.3, unzip 6.0.0: démarrer.txt
     

    Last edit: Dave Brondsema 2015-06-19
    • Dave Brondsema

      Dave Brondsema - 2015-06-19

      To use different versions of git, it can be useful to set the GIT_PYTHON_GIT_EXECUTABLE env var for the taskd process.

       
    • Dave Brondsema

      Dave Brondsema - 2015-06-22
      • Git 1.7.4.1, Windows Vista's built-in unzip: d+¬marrer.txt
      • Git 2.1.3, Windows Vista's built-in unzip: d+¬marrer.txt
      • Git 1.7.4.1, 7zip 9.20 (windows): d├⌐marrer.txt
      • Git 2.1.3, 7zip 9.20 (windows): démarrer.txt

      So Git 2.1.3 still isn't sufficient with windows vista, but is better in many other cases.

       
  • Dave Brondsema

    Dave Brondsema - 2015-06-22
    • labels: sf-current --> sf-current, sf-1
    • status: open --> invalid
    • assigned_to: Dave Brondsema
     
  • Dave Brondsema

    Dave Brondsema - 2015-06-22

    Seems like this isn't an Allura issue, but rather one between the version of Git and the unzipping software used. Upgrading to current version of git on the server will help.

     
  • Dave Brondsema

    Dave Brondsema - 2015-06-29
    • labels: sf-current, sf-1 --> sf-1
     

Log in to post a comment.