Category: Projects

Does pyqtdeploy even work?

I know nobody is going to read this terrible blog to find this, but still, I’m moderately frustrated in trying to find a decent workflow to deploy a small, single-executable, Python-based Qt application.

Even on Windows using C++, it was not so easy to build statically until I found the Qt static libraries on the MinGW/MSYS2 repository – then building statically became a magical experience.

So far, the only deployment tools that promise to deploy a Python Qt program as a single executable are PyInstaller and pyqtdeploy.

PyInstaller works by freezing everything, creating an archive inside the executable with the minimum number of modules necessary to run, invoking UPX on these modules, and then when the program is run, it extracts everything to a temporary folder and runs the actual program from there. As such, startup times seem to be around 3-5 seconds, and the size of the executable is about 30 MB.

pyqtdeploy works by freezing your code, turning it into a Qt project with some pyqtdeploy-specific code, and then compiling that code as if it were a C++-style project, so you could compile a static version of Qt against this generated code.

But in order to use pyqtdeploy, you need to have the libraries at hand for linking:

LIBS += -lQtCore
LIBS += -lQtGui
LIBS += -lpython36
LIBS += -lsip

There’s no way around it – you must build Python and the other dependencies from scratch, and this could take a long time.

I have also encountered strange errors such as SOLE_AUTHENTICATION_SERVICE being undefined in the Windows API headers.

I mean, I suppose pyqtdeploy works, but is this even a trajectory worth going? What would be the pre-UPX size of such an executable – 25 MB, perhaps? That would put it on par with the AO executable.

I might as well write the launcher in C++, or switch to Tkinter.

A humanitarian mission for mesh networking

After Hurricane Maria, I was invited to a Slack group in Puerto Rico to offer my programming expertise for anyone who needed it. After beginning to comprehend the magnitude of the communications problem, I scoured for ways to set up long-distance mesh networking – no, not mobile apps like FireChat that rely on short-distance Wi-Fi or Bluetooth to establish limited local communications – rather, ways to post and find information across the entire island, with relays that could connect through the limited submarine cables to the outside Internet as a gateway for government agencies and worried relatives.

During the three weeks in my interest of this project (but powerlessness in doing anything, as I was taking classes), I investigated present technologies (such as 802.11s), as well as capabilities of router firmware, theoretical ranges of high-gain antennas, and other existing projects.

I saw Project Loon, but never expected much of it. The project must have taken a great deal of effort to take off, but unfortunately, it seemed to have a high cost with little return. Essentially, balloons were sent from some point on Earth and then led by high-altitude winds to cross Puerto Rico for a few hours, eventually to land at some location in the United States. Despite this effort, I found very few reports of actual reception from a Project Loon balloon.

Meanwhile, someone in the mesh networking Slack channel informed me that they were working with a professor at A&M to implement a mesh network from a paper that was already written. While I ultimately never saw the implementation of this mesh network, I felt put down by my naivete, but accepting that my plans were undeveloped and unexecutable, I moved on with the semester. Surely, mobile carriers must have had all hands on deck to reestablish cell phone coverage as quickly as possible, which is certainly the best long-term solution to the issue.

However, many places other than Puerto Rico remain in dire need of communications infrastructure, in towns and villages that for-profit carriers have no interest in placing coverage in. Moreover, there are islands at risk of becoming incommunicable in case of a hurricane.

I am looking to start a humanitarian mission to set up a mesh network. I find that there are three major characteristics to a theoretical successful mesh network: resilience, reach, and time to deploy.

A mesh network that is not resilient is flimsy: one failed node, perhaps bad weather or even vandalism, should not render all of the other nodes useless. Rather, the network should continue operating internally until connection can be reestablished with other nodes, or the situation can be avoided entirely by providing connections with other node, or even wormholing across the mesh network via cellular data.

A mesh network that does not reach does not have any users to bear load from, and thus becomes a functionally useless work of modern art. No, your users will not install an app from the app store – besides, with what Internet? – or buy a $50 pen-sized repeater from you. They want to get close to a hotspot – perhaps a few blocks away in Ponce – and let relatives all the way in Rio Piedras know that they are safe. And to maximize reach, of course, you need high-gain antennas to make 10-to-15-mile hops between backbone nodes that carry most of the traffic, which then distribute the traffic to subsidiary nodes down near town centers using omnidirectional antennas.

A mesh network that takes too long to deploy will not find much use in times of disaster. Cellular companies work quickly to restore coverage – a mesh network simply cannot beat cell coverage once it has been reestablished. First responders will bring satellite phones, and chances of switching to an entirely new communication system will dwindle as the weeks pass as volunteer workflows solidify.

How do I wish to achieve these mesh networking goals?

  • Resilience – use Elixir and Erlang/OTP to build fault-tolerant systems and web servers that can shape traffic to accommodate both real-time and non-real-time demands. For instance, there could be both voice and text coming through a narrow link, which could be as low as 20 Mbps. There may also be an indirect route to the Internet, but there may not be enough bandwidth to allow all users to be routed to the Internet. Moreover, decentralized data structures exist that can be split and merged, in case new nodes are added or nodes become split in an emergency, with possible delayed communication between nodes due to an unreliable data link.
  • Reach – allow users to reach the access point via conventional Wi-Fi or cellular radio, and connect via web browser. Nodes use omnidirectional antennas for distribution and high-gain antennas to form a backbone that can span dozens of miles.
  • Time to deploy – use off-the-shelf consumer hardware and allow flexibility in choice of hardware. Make the specs open for anyone to build a node if desired. Pipeline the production of such nodes with a price tag of less than $400 per node.

I imagine that the mesh network will predominantly serve a disaster-oriented social network with various key features:

  • Safety check – when and where did this person report that they were okay or needed assistance?
  • Posts – both public and private
  • Maps – locations that are open for business, distress calls, closed roads, etc.
  • Real-time chat (text and voice)
  • Full interaction with the outside world via Internet relays
  • Limited routing to specific websites on the open Internet, if available (e.g. Wikipedia)

One issue with this idea, I suppose, is the prerequisite of having a fully decentralized social network, which has yet to be developed. But we cannot wait until the next big disaster to begin creating resilient mesh networks. We must begin experimenting very soon.

Threading in AC

Last time I read about threading, I read that “even experts have issues with threading.” Either that’s not very encouraging, or I’m an expert for even trying.

There are a bunch of threads and event loops in AC, and the problem of how to deal with them is inevitable. Here is an executive summary of the primary threads:

  • UI thread (managed by Qt)
    • Uses asyncio event loop, but some documentation encourages me to wrap it with QEventLoop for some unspecified reason. So far, it’s working well without using QEventLoop.
    • Core runs on the same thread using a QPygletWidget, which I assume separates resources from the main UI thread since it is OpenGL.
      • Uses QTimer for calling draw and update timers
      • Uses Pyglet’s own event loop for coordinating events within the core
  • Network thread (QThread)
    • Uses asyncio event loop, but it uses asyncio futures and ad-hoc Qt signals to communicate with the UI thread.
    • Main client handler is written using asyncio.Protocol with an async/await reactor pattern, but I want to see if I can import a Node-style event emitter library, since I was going that route anyway with the code I have written.

My fear is that the network threads will all get splintered into one thread per character session, and that Pyglet instances on the UI thread will clash, resulting in me splintering all of the Pyglet instances into their own threads. If left unchecked, I could end up with a dozen threads and a dozen event loops.

Then, we have the possibility of asset worker threads for downloading. The issue with this is possible clashing when updating the SQLite local asset repository.

The only way to properly manage all of these threads is to take my time writing clean code. I cannot rush to write code that “works” because of the risk of dozens of race conditions that bubble up, not to mention the technical debt that I incur. Still, I should not need to use a single lock if I design this correctly, due to the GIL.

Gearing up

It’s time to start work on Animated Chatroom. It is a monumental project; the largest project to date that I have ever desired to undertake.

My resources are somewhat scarce, but it could be worse. The two resources which I am in great scarcity of are developers (human resources) and energy (something which tends to be inversely proportional to time). The developers I seek are either not competent enough to produce modular code, or they live in a very different time zone that complicates any coordinative effort. My energy is drained from playing with my brother or doing real-life tasks which I have been postponing for too long, such as cleaning some things up.

There is another question that compounds a desire to do everything other than work on Animated Chatroom: where do I even start?

Well, let’s see what Torvalds has to say about his success:

Nobody should start to undertake a large project. You start with a small trivial project, and you should never expect it to get large. If you do, you’ll just overdesign and generally think it is more important than it likely is at that stage. Or worse, you might be scared away by the sheer size of the work you envision. So start small, and think about the details. Don’t think about some big picture and fancy design. If it doesn’t solve some fairly immediate need, it’s almost certainly over-designed. And don’t expect people to jump in and help you. That’s not how these things work. You need to get something half-way useful first, and then others will say “hey, that almost works for me”, and they’ll get involved in the project.

Okay, Benevolent Dictator Linus…

You start with a small trivial project, and you should never expect it to get large. If you do, you’ll just overdesign and generally think it is more important than it likely is at that stage.

All right, so we started with a small trivial project. It was called Attorney Online 2. It was good. And then it tanked because of poor design. I want Animated Chatroom to not go through that pain again.

Or worse, you might be scared away by the sheer size of the work you envision.

Which I am.All right, so what features do we not need? Let’s cut nodes until we get something less overwhelming.

Better. That’s almost the bare minimum that I need.

In list format:

  1. Core animation engine.
  2. Asset loader.
  3. Basic network.
  4. Basic UI.

That’s all, I guess I don’t care about anything else right now. So let’s cut it down even further.

Okay. So version 0.1 will barely have a UI. It’s just figuring out how stuff should work.

It’s clear that VNVM is at the center of this entire project. If I can design VNVM correctly, then this project has a chance; otherwise, a poor execution will lead to a shaky foundation.

The Visual Novel Virtual Machine

What is the purpose of the Visual Novel Virtual Machine project? The purpose is to bring sequences of animations and dialogue sequences, the bread and butter of visual novels, to a portable environment. From reverse engineering performed by others, it turns out that major visual novels also use a bytecode to control dialogue and game events. In the VNVM world, this is called VNASM (Visual Novel Assembly).

Within characters, emotes are simply small bits of VNASM, which are then called by the parent game (which also runs in the VNVM). Recording a game is just a matter of storing what code was emitted and when. The point is that essentially all VNASM is compiled or emitted by a front-end, rendering it unnecessary to understand VNASM to write a character. (But, it would kinda be nice to be able to inline it, wouldn’t it?)

This makes VNVM satisfactory for both scripted and network environments. In a network situation, where the execution is left open, there is a simple wait loop that awaits the next instruction from the network buffer. New clients simply retrieve the full execution state of the VNVM to get going. The server controls what kinds of commands can be sent to it; most likely, an in-character chat will look something like this as a request made to the server:

{ emote: "thinking", message: "Hmm... \p{2} I wonder if I can get this to work right..." }

The \p marker denotes a pause of 2 seconds, which the server parses out to emit a delay of 2 seconds (of course, limiting the number to a reasonable amount). The server then pushes the reference to the character who wants to talk, as well as the message to be said, and calls char_0b7aa8::thinking where 0b7aa8 is the character’s ID. This denotes that the named subroutine thinking is located in a segment of VNVM code named char_0b7aa8.

More to follow later.

Rethinking Attorney Online assets

It seems fairly obvious that FanatSors was not expecting a complex level of gameplay when he released the Delphi-made Attorney Online back in 2012.

Yet AO still exists, with about 150-200 daily players who frequent the couple dozen servers, with the most popular being /aog/’s Attorney Online Vidya. OmniTroid’s Qt-based, open-source AO2 client is now the de facto client, touting “advanced” features such as color, Unicode support, and parametrized preanimations. Likewise, the open-source tsuserver3 and esoteric-but-still-open-source serverD are the two choices of hosting for AO. Today, however, the legends of FanatSors and OmniTroid have faded away.

https://i.imgur.com/V30YtuL.png
An overview of the Attorney Online family.

Most players and case-writers are regularly impacted by the technical limits and quirks of the engine. Configuration of each character is all done in a single INI file, defining each emote as an octothorpe-delimited sequence of animations to be played. Each animation refers to a GIF file prefixed by an (a) or (b); that is, the format and naming scheme must be precise in the file system level.

This is not the main challenge, however. The challenge is managing assets.

Every server asks their users to download an archive, spanning up to 7 GB in size, containing character sprites, music tracks, backgrounds, evidence images, and sound effects that may be needed during gameplay. Assets are only identified by their folder name; this is the only unique identifier attached to an asset. These are the problems with using an internal name as the sole identifier:

  • Two servers may offer different content, but under the same internal name, causing a hard clash. Content could be isolated per-server, but this causes a serious redundancy problem.
  • Two servers may offer the same content, but under different internal names. This causes excessive redundancy.
  • Two servers may offer just about the same content, with a small difference. In this case, there is no hierarchy established as to which one is derived from the other one.

Upon requesting a character list for my proposed new standard base, nuVanilla (and receiving a monumental list!), it felt that the amount of dimensions that the assets needed to be examined in were too many to use a conventional spreadsheet for, so I opted for a full-blown database. My choice was split between MySQL/MariaDB and PostgreSQL, but I remembered that I wanted to learn Postgres, as the performance and versatility claimed to be far greater than MariaDB could offer.

One immediate issue is the sheer number of many-to-many relationships that manifest, causing an effective decoupling of many columns:

  • Multiple packs can include the same asset.
  • The same asset could be under different internal names.
  • Multiple assets can have the same internal name.
  • Multiple assets can represent the same character.
  • Assets of the same character can come from different games.
  • Assets can be in different formats, such as 256×192 or even 1280×720 (yes, some people resize their sprites to match their theme’s viewport size).

How do I represent uniqueness of assets, then? I can’t even hash the char.ini, because the char.ini contains the internal name of the asset. What’s more, there is no standard way to hash multiple files at a time; in this case, I would want to hash all of the emote images at the same time. (For now, however, I am hashing the char.ini until I find an adequate solution.)

One solution would be to give every asset a UUID. This would, in theory, add an additional layer of “uniqueness” into each asset. However, this still does not resolve the original problem: two assets with the same content but different internal names would still be detected as “different” upon submission, since the hash of each char.ini would be different. And this would compound a new problem on top of the old one: modifiers of an asset would be burdened with updating the UUID of the asset they are editing; forgetting to update it could only cause an error when uploading it to some centralized database.

What modifications can be done to an asset?

  • A small correction to frame data – minimal change
  • Emote additions – significant change
  • Internal name change – minimal change
  • SFX name change – minimal change

Three out of four of these changes are minimal changes. Thus, it would not make sense to consider them completely different assets. We can try to establish a hierarchy of assets, to see which asset succeeded the other, but that is no substitute for a diff. The data stored remains redundant.

Therefore, I can conclude Attorney Online assets cannot be accurately uniquely identified for management, and attempting to set up a database to manage them would take me nowhere.

I should then refocus my efforts to designing asset structure in Animated Chatroom.

Each asset would have a definition file (such as char.json), which would state the name of the character, its ancestors, and a reference to the sprite file. The format of the definition file would probably be JSON, while the sprite file would then be written in something like Spritelang.

The asset is then bundled using tar and signed using GPG. This verifies the identity of the packager of the asset (for increased trust, the packager’s key can be cross-signed by a responsible admin, who in turned is cross-signed by the Animated Chatroom Root Key. All of these keys can be uploaded onto a general-purpose key server, like pgp.mit.edu. The signature of a package need not be specifically signed by the AC official root key; just a key that is cross-signed by someone on the keychain. The absence of a signature does not mean that the asset contains malicious content and therefore cannot be trusted; rather, the purpose of the signature is to assure that the contents of an asset have not been modified, and to seal the credits of an asset. After the tar has been signed, the tar can then be compressed in a desired format such as xz (which is basically 7-Zip but using a byte stream as opposed to an embedded archive). In this case, the unique identifier of the asset is the key ID of the GPG signature. This is the strongest possible hash: not only is the data factored in, but also the identity of the individual who authored the asset.

Now that we have established a strong, unique identifier to our data, we need to solve the data redundancy problem.

Children of ancestors use an incremental tar file, which minimizes the content stored in the child. Even deleted content can be tracked if incremental tars are applied correctly. We may also employ a similar scheme as the Docker Image Specification (version 1), but it’s clear that someone did not read into tar’s ability to do incremental archiving out of the box. (“NOTE: For this reason, it is not possible to create an image root filesystem which contains a file or directory with a name beginning with .wh.. ” It requires little thinking to realize that this is a tremendously absurd limitation, just to add the ability of identifying deleted objects. If anything, deleted files should have been put in a separate file, for the sake of not introducing an artificial limitation to the file system.)

Incremental versioning has a major implication for authoring assets: authored assets are immutable. No, sir, the Animated Chatroom authoring tool will not allow you to modify an asset that has already been successfully packaged and signed, unless you choose one of the following options:

  • Create an asset that is a child of the asset you want to modify. This is not a favorable option if you have not published the parent asset. However, the authoring tool will set the hierarchy up for you.
  • Create a new asset, derived from the data of the parent asset. There is no hierarchy established, it’s just a hard copy of the parent asset. This is favorable only when the parent asset has not been published.

Acquiring modified versions of an asset would be simple under this system:

  • The desired version of the asset is downloaded and unzipped. (We don’t need to untar the entire asset yet.)
  • The signature is verified, and a warning is displayed if the signature is invalid.
  • The definition file is untarred and checked for an ancestor. If an ancestor exists, a recursive download request is made on the ancestor.
  • The asset contents are untarred on the target folder.

Asset servers for Animated Chatroom web clients can establish this hierarchy – without the expected redundancy! – by using symbolic links to represent files that are identical to the parent.

Finally, we can track what assets we have downloaded and what asset repositories we are currently using, by storing local data in an SQLite database file.

Instead of a name-based character list, servers use character IDs to disambiguate between different versions of the same character. A server can then offer download sources for specific characters, such as if a character was made “in-house,” so to speak.

What are the improvements over this design over the old design devised by FanatSors?

  • Authorship is immutable. This is important mostly for original content: repository owners will refuse to publish assets that refuse to identify the original creator of content. However, in the case of ripped content, it is desirable to preserve information about who ripped it, but ultimately, it is all copyrighted by the publisher of the game (Capcom, Chunsoft, etc.).
  • Downloading is automatic. Under the old system, players were too lazy to download zip files to play on new servers, and server owners were too lazy to compose zip files for players to download every time new content is suggested. Now, server admins need only do a graphical lookup of the assets they want to add on the server, confirm the additions, and the new content is immediately requested for download by clients, all seamlessly and in the background.
  • Name clashing is no more, since we established that internal names are no good as a unique identifier.
  • Asset content is deduplicated (to the best of the ability of this system).
  • Asset management is decentralized. I don’t own the database – in fact, nobody does. You can host part of the repertoire of Animated Chatroom content, but you can never host all of it.
    • On a similar note, this makes Animated Chatroom effectively immune to cease-and-desist notifications. I can take down the offending content on my servers, but due to technical restrictions, I cannot be responsible for the content hosted by other servers. The cease-and-desist notifications would have to be sent to each offending server.

This concludes an overview of the proposed design of asset management in Animated Chatroom. I hope you found this design enlightening for any future adventures in software development.

Reverse engineering LIMG

LIMG is the custom image format used in Professor Layton and the Unwound Future. Like many Level-5 shenanigans, it’s a custom format for absolutely no reason other than to serve as a deterrent for future reverse engineers (no pun intended).

After taking Tinke and applying it to the game’s image, we can easily find that the animations are CANIs, which get decompressed via LZ10 and passed through a QuickBMS script to get split into LIMGs. But what now? Tinke does not recognize the LIMG format, and all we get is an incoherent mess, which we can explore by opening the file as either a tile or a map. This process works best with small images.

We can, for instance, open up /lt3/menu/level5_a.limg, and expect the Level-5 wordmark. We can achieve getting the wordmark with an offset of 0x680, a 128×16 size image, horizontal image pattern, and 8×8 tile size, with still some artifacts and wrong color.

We can then take a look at the LIMG in hexadecimal, and try to find these values. We start off with the FourCC (LIMG), then what seems to be the 32-bit base offset value, … and then what?

That’s where I’m stuck. I need to know where the palette data is, etc.

Visual Novel VM

I’ve been feeling somewhat delusional lately because of this. I am not sure if it is a good idea because (1) it has never been done before; (2) it seems like a misapplication of an interpreter/state machine model; and (3) it takes some work to set up.

In theory, you could turn any sequence of API calls and basic arithmetic into a machine-readable language. Not all encodings are Turing-complete instruction sets, but a number of them are. For instance, I discovered that PostScript is actually a Turing-complete, stack-based language. You can do math in PostScript, define functions, and do things that are far beyond the scope of executing print jobs. If you keep this in mind, you can find that an instruction-based model is not a misapplication for a scripted visual novel. The other question stands, however: is it practical?

The goal behind VNVM is to make visual novels self-containing and portable by adding a layer of abstraction to the behavior of sprites, with the ability for instructions to be sent over the network without any danger of exploitation. While sandboxing is achievable with Lua and Python, I would end up placing an excess amount of emphasis on the security layer to prevent arbitrary code execution, which is ridiculous to defend against using a bytecode that is far more complicated than I really need. Moreover, using Lua in tandem with Python already raises eyebrows: why use two different scripting languages?

A side effect of this project is that you would be able to write visual novels in assembly, if you so choose to. You can also port the VM to the browser, to a GBA, to any platform you can play games on. But the final goal is to make any sort of visual novel play faithfully, without the end user needing to see the backend at all.

Delusional and worried that I was wasting my time, I wondered why the instruction set was so complicated. After taking some inspiration from PostScript, I realized that this approach of a VM, albeit somewhat strange, is workable. I just have to change the instruction set from register-based to stack-based, which was an easy change and eliminated about 100 lines from my code.

My main target right now is Ace Attorney. Currently, case engines are written in exotic languages, such as Delphi, Multimedia Fusion, or some “custom-made” language that looks very similar to AutoIt. VNVM, I admit, is no exception; however, my main argument is that its applications surpass Ace Attorney, and I plan to automate the construction process of a faithful case engine by creating a Python script that emits VNVM machine code. I have already written an assembler; now I am to write a script that uses inline assembly to directly produce machine code.

One useful application of VNVM is writing the behavior of sprites. Users of Attorney Online are accustomed to using .ini files to define emotes, sounds, and delays. But from a technical standpoint, this system is bunk: what if I want my sprite images to be organized in some other way in the file system? What if I need 8-bit transparency in my animations? What if there is something floating around me while I do everything? It is clear that .ini files cannot cover all cases, and even at the introduction of some complexity, the system falls flat on its insufficiency.

My idea was a domain-specific language called Spritelang. This is a macro-based language, which, unfortunately, has rather complicated syntax rules, surpassing my knowledge of compiler theory and the writing of tokenizers. I would certainly write it in JetBrains’ MPS, but I underestimated the notes when they said “this might take about a day’s worth of your time.” The meta-language is not an easy one to understand, but I can sense its immense power currently beyond my reach.

A successful migration

I migrated to the Raspberry Pi successfully. Most of the work was getting the Raspberry Pi back in working order after I accidentally broke the Raspbian installation with an incomplete update that paused right at the libc6 installation two weeks ago.

Most people will tell you that when things start to go awry and you don’t know why, just reinstall everything. The advice, while it may work, is incredibly unhelpful because they fail to understand all the work and reconfiguration that does not get magically copied over on reinstallation. I think it’s just a byproduct of today’s consumerist thinking: don’t bother repairing, just replace. Who cares where the broken stuff goes? The death of shop classes, as I had previously witnessed, is simply further evidence of this consumer culture.

And if you start whining, they’ll tell you to shut up, “managing a server yourself will take many hours,” “read this basic tutorial to the letter,” or “your posts show no research, get out.” And if you really do start making complicated arguments like the absence of fast reconfiguration on reinstall, they’ll tag out and bring in the intellectuals who will evangelize Docker and “write-as-you-go Bash scripts” at your face; “that, or consider paying someone to manage it for you.” What is more insulting than someone telling you outright that acquiring expertise is futile, and to simply give money that you don’t have to an “expert”?

Linux forums depress me, as the “pros” come off as, for lack of a better word, elitist. (There’s definitely a better word to describe them, but I can’t think of it right now.) Basically, if they don’t recognize you or your post count seems fairly low (less than 1,000), they’ll intentionally ask for information about your system beyond what is actually needed.

Anyway, back on topic. In short, I had to reinstall Raspbian, but I first had to back up the partition, which I thought I could do via SSH, but it seems that SSH would randomly halt with “Illegal instruction” around 20% of the way into the transfer. NOOBS doesn’t have SSH or any other file-transfer method (except netcat, maybe), so I just put the SD card in my laptop and transferred the files out. Then, I reinstalled Raspbian.

The configuration process was easier than I had originally conceived. You just install nginx, PHP, and MySQL (which is just a meta-package to install MariaDB – I didn’t know it was a drop-in replacement!), and then you copy in the site’s .conf file to /etc/nginx/sites-available and symlink it to /etc/nginx/sites-enabled. Next, you copy the site data to /srv/[site-name] – don’t copy to /var/www because that’s not the point of /var according to the Linux FHS (but packages aren’t allowed to automatically configure to /srv because the directory’s organization is at the sysadmin’s discretion). Finally, you make www-data take ownership of certain WordPress files, such as everything in wp-content and wp-config.php. To set up the database, just mysqldump¬†from the original server, transfer the query data over to the new server, and pass it as standard input. Don’t forget to re-create the WordPress database account.

Meanwhile, the PowerEdge 2600 is on the chopping block. DIMM bank 1 has been reporting an ECC fault since June, and ever since I restarted the server a few weeks ago, the first drive on the array has been reporting predictive failure, which will bring the array in degraded mode once it fails.

However, why should I put my entire server at the mercy of an SD card that can fail at any time? Hence, I should make a backup configuration now, before it is too late…

Update: I listened to the server spin down for its penultimate time: the PowerEdge has been shut down for good. The RPi has also been set up to perform daily backups of directories with user data and non-default config data on them.

I’ve sunk way too much time into this endeavor, and now I am very behind on my homework. I’ll have to work almost nonstop through the night if I want to get that homework done in time.

On top of that, now I must do a backup of the PowerEdge without arousing questions of what I actually used the server for. It’s almost disgusting how long I convinced myself to use that for, and how much money my dad probably ended up spending over the years just for power. A NAS would have been so much of a greater investment than that pile of junk; now the server is only worth the metal that it is made of.

Making an e-bike with display

This is an explanation of another one of those ambitious projects which I really want to do, but I have neither the experience nor the people to actually do it with.

I hate rough inclines: they kill my legs. The number one detractor to riding a bike in my childhood was that in my neighborhood, there are some very steep inclines. It made riding a bicycle not a very pleasant experience, and my father never wanted to bring me to a park for me to ride my bike on, so in the end, I never really used my bike.

However, given the fact that using a bicycle is the only practical mode of rapid transit in the city where I attend college, I want to actually start riding a bike again. And after a year or so of riding that bike, I want to make the riding experience cooler.

First, I want to retrofit a brushless DC motor to the drive shaft; something rated for around 600 W of power output. If it is not possible to attach it directly to the hub, I’ll attach it to the shaft with a belt; ideally, a belt with the quality of a timing belt. But I hope I don’t have to do this, because if so, I’d have to play with the tension, pitch, and so on of the belt, which would be problematic.

Next would be the electronic speed controller and charge controller. I want the controllers to automatically switch to a regenerative mode for slight brakes by bypassing the ESC, inverting the poles of the motor, and taking the current straight to the charge controller. Then, on pedaling, the controllers should switch back to drive mode. This behavior would be directed by the main controller, since regenerative braking is a non-essential feature.

Speaking of a main controller, what exactly is it? The main controller is the Arduino or whatever microcontroller I decide to use that is wired to the ESC and charge controller, but is not required to be run in order to operate the bike in case of a fatal error or low battery charge. It would run a real-time operating system with prioritized continuous tasks and many, many interrupt routines. These would be its high-level tasks, in order of descending priority:

  1. Emergency brake applicator. Continuously checks “emergency stop” button, dead man’s switch (clipped to clothes, but the clamp is limited enough such that it cannot be clipped to the handlebars or other part of bike; then the other end of the clamp is magnetically attached to a port on the control box), or >95% application of brakes while moving at a formidable speed.
  2. 10 Hz alternating pulse. This signal is generated and passes through some kind of failsafe circuit, which then determines whether or not the ESC should be enabled. The alternating pulse ensures that the main controller is not “frozen” on an operation that could prevent it from stopping the motor. This assumes that as long as the pulse is alternating, the controller is working as intended.
  3. Speedometer. It simply samples the speed at which the back wheel is spinning and determines the current speed.
  4. Speed regulator. This task scales back the output DC current based on how close the bike is to the speed limit. This task can be overridden, but it’s not a good idea to do so.
  5. Brake detector. This task detects the brake application percent. The actuation of the brakes is completely analog, but if it is significant, the main controller can signal to go to regenerative mode.
  6. Pedal detector. This task simply detects how much positive force is being applied on the pedal and sets the target DC current proportional to this force (clamped, of course).
  7. Odometer. It uses the same sampling metric as the speed counter, but it increments the distance by the circumference of the wheel. After around .2 miles, it writes to the EEPROM. I suppose I could use a pointer to level the wear on the flash, or I could use a preexisting file system designed specifically for microcontrollers.
  8. Display driver. This assumes that there exists a layer of abstraction between the UI and the display itself.
  9. Sound driver. Just for basic beeps and boops.
  10. Main UI. This handles button interrupts (the calls of which are passed to the foreground user task), the failsafe UI (if all user-mode UI tasks are dead), and the UI toolkit itself.
  11. Foreground user task. Dashboard, options, etc. Must not directly control motor operation.
  12. Background user tasks. Battery icon, clock, etc. Must be non-critical.

The e-bike’s main controller would require a key for operation and then a simple on/off SPST switch located in front of the handlebars. The display would ideally be a Hitachi HD44780-esque LCD, but it could also be the Nokia-style LCDs, although these might be a little too small. There will be six buttons: on the left below the display, there will be four directional buttons laid horizontally (in a style familiar to Vim users or Dance Dance Revolution/StepMania players), and on the right, a back button and an enter button. The display and controls need to be water-proofed.

Instead of using heavy deep-cycle lead-acid batteries, I’d just opt for using LiPo cells, which are ubiquitous in hobby usage for high-performance electronics. Industry professionals are not fond of LiPo cells because they are comparatively more dangerous and volatile than other types of cells, and this increased risk cannot be tolerated in mass production. However, since I am not mass-producing e-bikes, it should be OK to accept the risks and enjoy the power of lightweight LiPos, as long as their charging is supervised closely.

This e-bike also needs a brake light, signal lights, and an LED headlight with a white color temperature rather than blue.

That’s all I want the bike to do. All of this, but I want to keep it street-legal and be able to prove that it can be safely ridden in busy streets under the consideration of various fail-safe mechanisms, including a speed regulator that requires manual override.

Sadly, I don’t know if I will ever be able to make this contraption.

Where’s the good backup software?

For *nix users, the answer is easy: rsync. For Macintosh users, the answer is even simpler: Time Machine (“time ‘sheen”). For Windows, the answer is a convoluted mess of choices. And the problem is that none of those choices give everything you want.

Why can’t you have everything? Here’s all of the things a backup program needs:

  • Permissions. If you can’t preserve your metadata, forget about making faithful backups. POSIX and Windows permissions are very different, but they still deserve the same love.
  • Resilience. The restore part of a program should never produce a fatal error, unless a backup has been corrupted beyond repair. If a part has been corrupted, ignore that part, notify the user that a corrupted portion was ignored (noting, of course, what the corrupted portion actually is), and continue with the restore process.
  • Compression. Many would argue that compression only makes the backup more difficult to restore, yields a minimal return in efficiency, etc. However, this can make a large difference when uploading from a personal home network to a storage service, where storage costs are billed by the gigabyte. I don’t know about you, but $1 a month was more than my tax return this year.
  • Encryption. Everyone’s got their tinfoil hats on, how about you?
  • Incremental backups. People are not going to do full backups every week. This is a waste of time, storage space, and energy, since most files would be redundantly stored.
  • Block-level. If you modified a 20 GB VHD file, are you going to copy that whole thing on every weekly incremental backup? No, you’re going to copy the differences in blocks/parts of that file.
  • Archivable. It appears most people choose either image-based backups or file-based backups. I personally prefer at the file level, but this should not mean “copy millions of files and spew them on the target directory.” The backup should be neatly organized in, say, 50 MB parts that can be easily uploaded to a cloud service as part of a future backup plan. Or, it can just be made as a monolithic 800 GB file. The former is workable by most consumer file services, while the latter is most convenient for more enterprise-oriented services like Amazon Glacier.
  • Resumable. Most backup programs hate it when you shut down your computer for the night. Yet none of them seem to understand that¬†this is exactly what shadow copies are for. Even after shutting down the computer, shadow copies do not magically change. Yet the software goes, restarts your entire backup, and creates yet another useless shadow copy for the mere sake of not wanting to touch files in use and making the most up-to-date backup possible.
  • Snapshots. Let’s say I don’t want to restore my whole computer; I just want to see an old file and its version changes over time. Most backup programs will not let you do that, citing that it is “too complex.” No, it’s not. Track the files the software backed up, using a tiny database like SQLite. There, you can store checksums, file sizes, previous versions, and so on and so forth. The suffering ends there. The end user can view a snapshot of the computer at a certain point in time, or view the history of a specific file, perhaps with diffs (binary diffs if the backup software is user-friendly enough).
  • Low profile. What is CloudBerry Backup using 2.7 GB of memory for? Just flapping around? No! Decent backup software should use 100 MB of memory, tops. Leave the heavy RAM consumption to browsers, games, and servers.
  • Integration. This backup software should be robust enough to make anything either a source or a destination for backups, notwithstanding the limitations of each backup medium.
    • Least liquid: Offline local storage; Amazon Glacier; Google Coldline
    • Somewhat liquid: FTP (due to its slow transfer speed of many files and inability to perform multipart transfers); most consumer storage services
    • Most liquid: iSCSI SANs; high-availability storage services
  • Drive path-agnostic. A backup software should never, ever depend on drive letters to figure out backup sources and targets.
  • Predict drive failure. This goes somewhat beyond the scope of a backup software, but there should be at least some kind of periodic SMART monitor to inform and warn a user of a drive that is indicating signs of failure. Yes, put a big popup on the notification bar with a scary message like “Your drive might fail soon” or just outright “Your drive is failing.” Show it to them the first three days, make it go away, and then show it to them the next week. Of course, the notification can be removed for a specific drive, but it will require them to read a message about possibly losing data on the failing drive, wait 5 seconds to close the dialog, and now they never have to see the dialog for that drive again.
  • Recognize cache folders. Here’s what you need to do: just stick that CCleaner scanning stuff into your product. Make the default backup plan ignore whatever CCleaner would usually clean up. Caches can add up to be gigabytes of size, and many users do not even care about including them in their backups, because all they want are their programs and documents. However, there is that one company that might say, “no you can’t ignore cache folders because we need a perfect file-level backup of the system tree.” (My argument would be to use CloneZilla and do it at the image level – but fine.)
  • Import from other services. No, I don’t care much about Acronis, Veeam, or other proprietary solutions. What I do care about, however, are the crappy Windows 7 Backup and Restore backups, dd “backups,” and other image-level backup formats. Don’t just import the backups: import file history, recompress them, preserve timestamps. Give them the full treatment, and put them neatly in the new backup format as if it really were an old backup.
  • Responsive (and responsible) backend. Big enterprise backup software uses a UI frontend, which merely communicates with the service backend. This is generally a good design. However, when the backend decides to quit, the UI frontend goes into limbo and does not respond to any commands, instead of providing a reasonable explanation to what is happening with the backend, while the backend does not attempt to halt whatever blocking operation that is taking too long. The gears just grind to a halt, and nothing can get done on either side.
  • Don’t delete anything without asking. No, I don’t even want an auto-purge functionality, and if you do, for the love of God, make it a manual operation. There is no reason to keep purging things constantly, unless you have a disk quota to work under – in that case, the software should determine what is best to purge (start with the big stuff, at the earliest backup) to meet the size requirement.
  • Only one backup mode. That backup mode better be good, and it should have a hybrid format.
  • Open-source format. The software itself may not be open-source, but you are essentially ensuring that someone out there can make a restore software that can always be compatible with the latest and greatest operating systems.
  • Bootable. Where are you going to make your restores from? A flash drive running Linux with an ncurses interface for your backup software, obviously. You could, of course, allow backups from that same bootable drive, in the case of an infected drive or as part of a standard computer emergency response procedure – but eh, that’s really pushing it. Just restores will do fine.
  • Self-testable. Make sure the backups can actually restore to something.
  • Exportable. One day, your backup software will not be relevant anymore, so why bother locking in users to your format? Make it so that they can export full archives of their backups, with a CSV sheet explaining all of the contents of each archive.

At the end of the day, users just want their files safe and sound, so keep the software as close to the fundamentals as possible, and allow others to make tools around the backup software if additional functionality is needed.