Hacker News new | past | comments | ask | show | jobs | submit login
Building MS-DOS 4.00 on FreeDOS [video] (youtube.com)
73 points by zdw 16 days ago | hide | past | favorite | 15 comments



I learned a lot just reading the blurb for their youtube video:

You probably saw recently that Microsoft and IBM released the source code to MS-DOS 4.00 on GitHub. This is under the MIT License, which is an open source license and compatible with the GNU General Public License that we use in FreeDOS.

But if you tried to compile the source code release, you may have hit some problems. I understand it builds fine under MS-DOS, which is what you'd expect. But under FreeDOS, you have to clean up the source code a bit before you can build it. The short version of what's going on is that when they put the code into GitHub, GitHub used UTF-8 for encoding, and DOS uses Code Page 437.

Here's how to clean it up. (Visit our website https://www.freedos.org/ )

is that really the license? I thought Microsoft released under an "except commercial" educational license?


Your memory is correct, that was the case when they first released the DOS source code to the Computer History Museum, but they have since relicensed all open source releases of DOS under the MIT license.


> All files within this repo are released under the MIT License as per the LICENSE file stored in the root of this repo.


Here's the mailing list post mentioned in the video, with the details on how to reformat the code:

https://sourceforge.net/p/freedos/mailman/message/58765259/

Also, there's a simpler way to do the LF -> CRLF conversion, as found by neozeed from Virtually Fun: Just zip the source code, then unzip it again using the `unzip -a` command (the `-a` switch performs the auto-conversion)


> The short version of what's going on is that when they put the code into GitHub, GitHub used UTF-8 for encoding, and DOS uses Code Page 437.

So why isn’t this a problem if you’re compiling it under DOS?


> > The short version of what's going on is that when they put the code into GitHub, GitHub used UTF-8 for encoding, and DOS uses Code Page 437.

I don’t think that’s the actual problem. GitHub does not change the encoding of files in Git repos. Nor (generally speaking) does Git itself [0]

It appears most of the encoding issues were introduced by manually editing files, in response to a corporate policy that certain comments be censored (mainly comments that could be viewed as unprofessional). While it is possibly to do this without altering the encoding of the files, it appears the person who did it used an editor that rewrote the file encoding, and didn’t realise it was doing that

[0] Git can be configured to do automatic conversion between bare LF and CRLF newline formats. That’s generally a bad idea. It is mainly people on Windows who turn it on, although nowadays many Windows tools can handle Unix style line endings without issue, making it largely unnecessary. Also, you can install Git repository hooks that alter file contents as they are being committed, although the majority of repos don’t have any such hooks installed


Some tools - even cross-platform ones - will create CRLF files on Windows by default (looking at you JetBrains) so when working on Windows I usually like to have autocrlf set to 'input' to avoid accidentally committing CRLF files into the repository.


Respecting platform conventions is the sensible default, particularly for a platform that represents over 80% of desktop marketshare.

This problem used to be a whole lot worse under pre-OS X versions of macOS that used CR (just CR, no LF) as the line separator. At least now there are only 2 commonly used conventions and you can essentially just ignore any CRs you encounter in most cases.


> This problem used to be a whole lot worse under pre-OS X versions of macOS that used CR (just CR, no LF) as the line separator.

That bare CR newline convention used to be very widespread – not just Classic MacOS, also many 8-bit micros (Commodore, Acorn, Apple II, TRS-80, ZX Spectrum, HP-85), Oberon, MIT Lisp Machines and Microware OS-9. But, by the late 1990s, Classic MacOS was the only one of those systems with any mainstream significance. And now you'll only encounter bare CR newlines in retrocomputing or obscure legacy systems – oh, and raw mode terminal input.


> Respecting platform conventions is the sensible default

That's fair, but I'd argue an equally sensible default would be to respect the conventions of the language ecosystems you're working within, and for Java, Python, Rust, that's LF. IntelliJ provides a lot of configuration on a per-language basis but line separators for new files is a global setting for some reason.


I have a few toy programming languages I created as a hobby / learning exercise. (Most of them I’ve never released, maybe some day.)

In one of them, I decided to make carriage returns a lexical error. In fact, the only C0 control it allows in source text is LF, it doesn’t even allow tabs (my personal answer to the perennial spaces-vs-tabs debate).


Ok, but again, why is this codepage workaround only required under FreeDOS?


It's the same under "real" DOS. The code from Github doesn't build without fixing the line endings and encoding.

What seems to have happened, is that at some point they opened it as UTF-8 in an editor, which replaced some Cp437 box drawing characters with U+FFFD (encoded as the byte sequence EF BF BD). You can replace these with any single byte and it will build, but making the comments and some TUI stuff look correct requires more editing.


utf-8 and code page 437 are identical for all the characters <127.

Typical source code doesn't use characters outside that range anyway?


I'm wondering what uses more memory: the "buggy and bloated", but written in assembly language MS-DOS 4.0, or FreeDOS, which is almost 100% in C?

To make it a fair comparison, you would have to load FreeDOS in conventional memory (DOS=LOW) too, since 4.0 doesn't support anything else I think.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: