Google as a news aggregator

It turns out, after all, that Google Reader was shut down only for news aggregation to become Google Search’s primary function.

I have been having substantial trouble these days finding meaningful but relevant information, especially from smaller blogs and old university websites. Instead, Google tends to show only the top results from the most prominent 1,000 websites, even if the results are not relevant.

I look up the phrase “moral bankruptcy.” Google boasts millions of results, but it only presented to me about 5 pages (232 results), most of which interestingly enough were strongly worded articles from The New Yorker, New York Post, The Atlantic, relating to current events. The algorithm is working because one of the results is an old-looking university page, another is a PDF, but where are the rest of the results? What if I want to search deeper?

What if I want to only show results from small blogs (without naming any particular domain names)? What if I want to show results from websites that don’t exist anymore, such as GeoCities? What if I also want to connect a well-known library database, such as WorldCat? What if I want to search and visualize the plethora of social media results as well?

The point is that in an ever more complicated Internet, Google Search is an insult for finding the world’s information. Searching the Internet deserves a better experience.

It may well be possible that over 50% of pages that the Internet has ever hosted have been deleted, though an exact number would require further research. Google Search does not reflect this fact. Instead, it suffers from an acute case of survivorship bias when it ranks the longest-living, most content-rich websites at the top, which are usually the largest, most financially secure websites.

Then there’s also geospatial data: businesses, historic landmarks, municipal zones, festivals, traffic, relative popularity/density (in other words, “where is everyone at/what is everyone doing right now”), geocaches, and other importable GIS data that could probably be aggregated and organized.

My conclusion is that Google gave up on Search a very long time ago; it’s now just a facade for its massive advertising and data-mining operations.

If someone wanted to make an even better search engine for the modern Internet, the market is open for that. Many scholars and researchers are thirsty for an elegant tool that allows them to deep-search all archived and printed content that has ever existed, not just what is recent, trendy, or popular.

There are some problems to be overcome, of course:

  • Physical (library) media cannot be ranked the same way as online media. The PageRank algorithm, Google’s winning formula for a meaningful search engine, requires hyperlinks between pages to be recognized. Instead, physical media has to be ranked based on popularity (how many patrons have checked it out, how many clicks it has received) or by having a natural language processor that is robust enough to understand cross-references and citations to other works and people, even outside the context of scholarly articles. Fortunately, advanced natural language processing is approaching our reach.
  • It’s difficult to come up with resources that can index today’s massive Internet. Twenty years ago, it could be done with a rack-mounted server in the basement and crawling to the heart’s content; these days, the activity that web crawling creates looks an awful lot like malicious probing, and a comprehensive crawl could take many months to complete. It would require the cooperation of other organizations and existing search engines to seed content while an original index is constructed.
  • Most non-hypertext content is not conducive to crawling and indexing. While there are APIs for social media and library databases, they are not open to outright data dumps. Crawling is once again frowned upon as abusive activity – you’re stealing the social media website’s business, which is data. Many of these platforms require the user to be on the platform to browse all of its content.

Now, if you can give it an academic or humanitarian twist – “oh, I am trying to make it easier for scholars to research social media trends and information in a centralized page” – perhaps one can woo over a large organization into providing a data set.

Reality

It’s Good Friday. In silence, I am met with the sobering reality of life, the fallout of the novel coronavirus, and my disconnection from everything I long for.

I look out the window for the thousandth time. It’s covered in pollen, as no one has cleaned it since the house’s construction twelve years ago.

I check my Facebook notifications, to see if anything has happened with my pokes. A post appears on my feed about a friend and his girlfriend and their celebration of their togetherness.

Without turning the light on, I look at myself in the mirror and gather up another summary of who I am. A man who sees life as it is at face value, unaltered and unblemished. But everyone knows too well what life really is: an endless struggle, a war for meaning. They want something supernatural, something magical that lets their minds run loose with imagination. To this effect, the stern look on my face has little new to offer.

I don’t think the market (the relationships market, the business market, the job market) is particularly interested in reality. Reality is static; it describes how things are right now. It imposes constraints on us. Though some choose to accept these constraints, while others blissfully ignore them, the constraints are there.

The sobering truth of human society is that we may never find an objective, universally accepted purpose for our existence – that is, a specific, concrete purpose that all of humanity will agree on. We tick like clockwork in our thought, though primitively we are animals influenced by feeling. In the mirror, my eyes reflect a dense emptiness, the result of a battle between this rational self and animal self. I know much about the world and its governance by mathematical processes, but my mind continues to wonder for what all of it is useful for.

But if people can be offered evidence to just believe in something, then they invest their mind into it accordingly. Apple products, at the time, were magical in the public eye – people were convinced they could do anything, and at every release they bought them religiously. Likewise, stock market prices reflect people’s beliefs on how successful a company will become. People enter relationships, believing that their partner is really the one person they need in their lives. Jurors write “guilty” or “not guilty,” believing that the verdict they have chosen is the most appropriate one. Chess players make a move, believing that the move they have made is the one that will let them win the game.

It is the illusion of opportunity that entices one to make a decision, and the illusion of objectivity that confirms the decision.

If I continue to strangle myself in the constraints of reality – a reality that sees only cold facts and not opportunity – then will people ever see opportunity in me? They will see what I saw in the mirror – the face of a boy anxious about his future, longing but seemingly unable to find some shred of hope to cling onto. But people want to see what they might enjoy with me, what jokes I might laugh to, what advice and experience I might have in store for them, even if I myself do not believe I have anything at all to offer. After all, is the reality that I have nothing to offer, or that I have far more to offer than I have ever given myself credit for?

So screw reality.


Looking back, I think that my thoughts may be mistaken as a foray into nihilism. Far from it. Humanity constantly strives to seek the truth, and if I eliminate that, then I am rejecting essentially all intellectual study.

When I think of reality, I think of all of the facts of the present day. In a distributed system, all nodes are striving to synchronize with each other to achieve a consistent state. However, there are some events that occurred in the past that did not make it to the consistent state; this is an unfortunate side effect of forcing a distributed system into consistency.

When I went to Japan three years ago, I wrote about some events and took pictures of others, but others yet remain solely in my mind. They cannot be reproduced, and there probably is little further evidence that they occurred. At some point, we must make a binary conclusion – that’s how the courts work, that’s how scientific inquiry works. Is it true, or is it false.

We need to take in the intricacies of reality and not allow ourselves to be constrained by mere “true” and “false.” Rather, we should take the time to explore the world and appreciate the things that we may never be able to put a “true” or a “false” on – arts, humanities, film, writing, sightseeing.

The anxiety of life

I haven’t written a public post since August. I apologize for this – ever since the day I decided to move to that dastardly co-op, I haven’t felt like I’ve had enough time to write here.

Though this is not an excellent time to write at the present moment (as the novel coronavirus pandemic has just struck the United States, sparking a major recession), I should focus on a holistic point of view.

Partly due to anxiety, about a third of my time has been consumed by indecision than doing actual work. Should I take the online quiz now, or study more? If I keep on like this, will I make an A in the class? Should I eat dinner now, or in an hour? Will I be able to finish this assignment on time if I postpone it for tomorrow? Should I try to open conversation with this girl? Do I look good?

But most importantly, am I making any progress in life?

In some sense, yes: I am nearing completion of my computer science degree, which I will be able to finish one semester early. My finances are in very good shape, and last November, I was even able to begin investing in stocks. My third internship is slated to occur this summer, with even rent fully paid for by the company.

Yet I am fully consumed by anxiety and cynicism: cynical over humans’ systematic self-centeredness (including how I will inevitably fall victim to it myself) and anxious that I will be unable to find fulfillment in life, I find no easy solutions and find myself like a dog paddling vigorously in the water to stay afloat. The exertion turns out to be counterproductive, exhausting one’s efforts when one can remain afloat simply by lying down on the water.

I have defined my metric for life progress as my most difficult challenge: what is the size and quality of my retinue – how many friends have I gained the privilege of having? Who would actively stand behind me and what I believe in? The logic goes that if I can maximize this – if I can finally feel like I am enjoying the adventure of life together with other humans, instead of feeling like I am fundamentally in conflict with them – then will I consider my life purpose to be at least partly fulfilled.

But in reality, I use this metric as an excuse to compare myself to others. When I see others flourishing in social settings, I degrade myself because it seems they can find connection more easily than me, and therefore they have made better progress in life than me. They even go as far as finding girlfriends and boyfriends, leaving me to try to convince myself that I do not need any of those things because they are unattainable for me.

The frustration is that it is typically at around the end of the academic year when my social life finally begins to gain traction, and so the vine of grapes is lowered to an attractive and reachable level. Perhaps I am able to reach and take a few grapes of consequential friendships and events, but by the time these things occur, summer arrives and the vine is taken away. During those months I become delirious, craving social events that perhaps may lead to the discovery of a great friend or an adventure that will never be repeated.

The opportunities are there, but by the time I see them, they are missed. This is the crux of a social anxiety that has marred me for the majority of my life: in an attempt to recognize opportunities, I become paralyzed in indecision, wondering what will lead to the most consequential outcome. In the end, I take no drastic move, nothing happens, and I sulk around waiting for the next opportunity to come.

It’s perhaps because I am desperate for immediate, radical success that causes me to pursue only the most ambitious opportunities.

In summary, despite my objective, external successes, I am growing more and more discontent with my life, but do not know of a way I can gradually cause change in my life in order to dissipate this discontentment.

Film piracy

The film is a captivating, immersive medium that narrates a story with a synchronized visual and audio track. In its similarity to television in that it can be consumed passively rather than requiring an active effort in imagination, movies have reached far more Americans than books. In the medium’s success, it has become a lucrative business over the past one hundred years.

Today, the majority of Americans know of only one type of movie: the movie that is approved by the Motion Picture Association of America, an industry group that comprises of six media conglomerates with near-exclusive access to what is commonly known as the movie theater.

The American has little access to the independent or international film; only has streaming eased the resistance of these films’ entry into the market, and even then their exposure to Americans is often filtered through contracts managed by the Big Six.

The end result is the offering of a relatively small catalog of extremely popular mainstream movies, whose distribution is tightly controlled from production to presentation. A consumer can watch an American movie only through a handful of means:

  • If still in theaters, an MPAA-approved facility.
  • Blu-ray disc, with an up-to-date Blu-ray player and a high-definition TV with HDCP.
  • DVD, with a player of the correct region, assuming that the movie was distributed by DVD.
  • A constellation of video-on-demand services, assuming the service has an up-to-date contract with the distributor:
    • Netflix
    • Amazon Prime Video
    • iTunes

Ironically, the battle was lost decades ago in the music industry, with purchased music distributed with no DRM.

The problem is that due to the increasing volume of media and the tightening demands by licensors to provide adequate protection for media, the film medium as a whole is becoming increasingly inaccessible. One can no longer go to the video store, find anything to one’s content, and pop it into the player: one must sift through a variety of providers to see who actually distributes the movie, at a variety of price points.

And what happens if the content is not accessible? What if the movie or the TV show is not accessible in one’s country due to contractual limitations? What if technological advances prevent one from enjoying purchased content? What if the content is no longer distributed by anyone?

These are reasons piracy is rationalized – not because it is in people’s willful intention to commit robbery, but because piracy has unfortunately made it far more convenient to acquire content than through a legitimate transaction. Content is easily searchable, and its results are displayed on a minimalistic table that lists every version of the content ever released. A few clicks later, and the content is downloaded as a file to one’s hard drive, and the download manager intelligently seeds the file back to its peers in a gesture of cooperation. The content (which, surprising to some, conforms to high quality standards) has already been decrypted, stripped of DRM, and is ready to be played back on virtually any device.

There is no question why the MPAA seeks to drill anti-piracy campaigns into the minds of Americans: because despite all of its efforts, and despite the illegality and unethical nature of piracy, the members of the MPAA have been unable to compete with the convenience of peer-to-peer downloading and DRM-free video.

For instance, suppose I wish to watch Cowboy Bebop: The Movie. While the movie’s theatrical reception was subpar, it is still original Cowboy Bebop content that is worth watching – like an extended episode. My ideal solution is to look it up on an ad-free, subscription-based streaming service such as Netflix, hit play, and then hit the “cast” button to any television in my house or even a monitor on my computer.

However, Cowboy Bebop: The Movie is not available. At all. Its only availability is through an obscure DVD release with varying prices, indicating that some DVDs are region-locked to Japan, while others can be played in the US.

Ultimately, the easiest way to play this is by searching it in a torrent database, downloading it, and then serving the file to the television.

It is the sad truth that despite having played movies from discs and boxes in my personal possession, purchasing high-quality content and then playing it from servers in my personal possession is frowned upon as piracy. (Ironically, even Apple designed this correctly: movies can be streamed either from other computers that have the downloaded content, or Apple’s own servers.)

Ultimately, piracy is a consequence of attempting to navigate a broken system of film distribution, and those astute enough to recognize this tend to pull away from mainstream media to enjoy more traditional media, such as books, which convey the same messages and experiences in more elegant terms.

A troubled relationship with GitLab

When I discovered GitLab in February, a rebellious passion flared up – a desire to break away from the omnipresent, walled-garden development ecosystem that is GitHub.

After GitHub had supposedly banned one of my developers for Attorney Online, that disdain for GitHub flooded over (although it was only later that I considered that perhaps he had involved himself in something he did not tell me about). I switched to GitLab in two days and was content by its fully-featured nature: it could show icons for repositories, group repositories into projects, mirror in both directions, and it even came with a fully-featured CI! Satisfied that I could escape the grasp of GitHub, I moved the main Attorney Online repos to GitLab to make a pipeline and allow my banned developer to contribute.

But six months on, the cracks of GitLab were beginning to show: odd bugs, thousands of issues on the GitLab main repo, slow page load times, and a seemingly endless amount of switches and dropdowns on every panel. It was like Jira all over again – and things were not improving.

In those same six months, GitHub was making leaps and bounds to compete with GitLab’s bells and whistles. Adding features such as security bulletins, issue transferring, jump-to-definition, and sponsorships, GitHub was also trying to reel its open-source users back into the platform – and they also seemed to cut back on their omnipotent moderation, instead granting repositories the tools to moderate themselves.

The distinction was thus made clear to me: GitLab for enterprise, GitHub for community. Enterprises don’t care about simplicity, but hobby developers like myself do. Both also succeeded in adding features that were orthogonal to each other – only GitHub supports jump-to-definition, but only GitLab supports arbitrary mirroring rules.

After reverting my move to GitLab, I saw that GitLab was flexible enough to allow me to reap the benefits of both ecosystems – GitHub for project management, and GitLab for its advanced CI pipeline and artifact hosting.

In the end, it is inevitable for today’s developer world to spin around an indispensable GitHub: it is the product that tamed a complicated version control system and popularized it in a simple-to-use program for managing open-source projects.

Most stripped-down version of Chromium

I am curious how far Chromium can be stripped down for embedded applications. The natural thing to do, of course, is to not use HTML for embedded applications. But I wonder: can one strip down Chromium to its bare minimum? No video or audio, no WebGL, no MIDI, no CSS 3D animations, no WebSockets, no profiling, no extensions, no developer console – basically nothing except the layout engine, a JS engine, and a basic renderer. The target is to minimize binary size and memory footprint while still drawing from an actively maintained code base, as HTML 4 engines are not really maintained anymore, and many modern websites do not render properly on HTML 4 engines.

Just an idle thought. No action is needed.

Moral dilemma

I know when I’ve wasted my time away when my joints feel stiff, my eyes inflamed, and my mind unstimulated. Instead, I’m playing TF2 to waste the time away, as I had originally planned to watch a movie, but the plans had fallen through.

It’s difficult to stay motivated when most of my friends are in Austin, doing whatever. For me, the only real excitement will be flying to the East Coast again for the yearly plane competition, but undoubtedly there will be an extremely stressful problem that will counterbalance the excitement.

Continue reading Moral dilemma

Using OpenPGP for video games

Cryptography seems to be all the rage these days: it is a method to strongly prove the occurrence of an event, and no one can question the universal truth of mathematics. The blockchain hype merely serves as witness to this trend toward cryptographic verification.

I see cryptography as a topic long avoided by mainstream software developers due to its core functionality being backed by pure math, a field which software developers are either not fond of or not competent in (or both). It is often perceived as a “black box” which ought not to be touched or recreated, lest one’s application be infested with thousands of security issues. However, cryptography is not purely math, and today, well-tested abstractions exist to make common cryptographic applications understandable and implementable.

Online video games tend to be backed by a central or master server, which places two main liabilities on the part of the maintainer of the game:

  • the responsibility of the maintainer to secure personal information within the server (such as email addresses and passwords) and to report security breaches; and
  • the regular maintenance of the server, which is essential to maintaining the ability for players to use the game.

However, as time progresses, the maintainer of a game is more likely to renege on these responsibilities for economic reasons, either causing the player’s experience to be significantly degraded or rendering the game entirely unplayable.

However, replacing a master server is not the main focus of this writing; rather, I wish to discuss an important issue in online games: who to trust.

I wish we could trust everyone – however, when a game gets sufficiently large, it becomes statistically likely for a player to decide to cheat or flood a server to attempt to break the game experience – in which case now there is one player we cannot trust.

With the rising popularity of virtual private networks and proxies, anonymity is king. It is nearly impossible to uniquely identify a player without prompting for a particular set of credentials by a master server. Even then, it is easy for a banned player to create new credentials and begin cheating or spamming again – the natural response is to increase the amount of personal information prompted by the central server, but this likewise increases the liability held by the owner.

One approach to mitigate this problem is by employing a cryptosystem specifically designed to solve the problem of trust – in our case, who should be allowed into a server, and who should not. OpenPGP is one such well-established system that uses a decentralized web of trust to systematically determine who can be trusted, without the liability of a centralized server or unintentionally restricting legitimate users from playing (such as banning an excessively large IP range, or requiring users to provide personal information that violates some users’ privacy).

The reason OpenPGP lacks adoption is primarily because of its unintuitive nature and its dependence on everyone to use the system in order for its individual users to benefit from it (a collective action problem). However, the limited domain in which OpenPGP would operate allows it to be enforced behind the veil of abstraction. Users will never need to know what “keys” or “signing” are – they only know that they have an identity that they can optionally secure with a password. They can also “like” other users. If their identity becomes compromised, they can choose to “destroy” it forever. Behind the scenes, keys have a six-month expiration date that is automatically renewed simply by playing the game.

On the server side, the server operates an extended whitelist based off a basic whitelist that lists primary identities that are fully trusted. Identities that are (indirectly) trusted by those primary identities may also be allowed to join the server. After the authentication succeeds, the server can reliably recognize the identity of a player, useful if the player has a specific rank or level on that server.

If all servers are whitelisted, then how can new players join? An optional centralized server can automatically grant new players trust temporarily, which allows them to join “newbie” servers and gain trust until they find themselves allowed into other servers. While this disincentivizes curiosity, this incentivizes playing in the same server as others, as well as social integration into the community. Alternatively, new players can also request trust through a side channel, such as a community chat server or forum outside of the game.

If a player has lost the trust of the community by breaking a major rule or through social engineering, other players can revoke trust just as easily as they imparted it: the OpenPGP system allows for signature revocation.

In short, an ideal implementation of public key-based infrastructure in video games would be seamless to users, eliminate the costly upkeep of a strong centralized server, and encourage regular social activity.

Noise

I woke up reading the wrong part of the Internet. Not wrong as in factually wrong – wrong as in something that does not help to stabilize my mind or bring me to a higher level of understanding of the world and that which is beyond our grasp.

The polar divide in ideology which I witness today is, from what I can see, a mostly American phenomenon spurred by intense political competition. While other countries have seen localized and sometimes violent conflict between two sides (often racially driven), never before has a war of information been seen at this scale and ferocity.

As someone who is acutely aware of all of this, it causes me great anxiety to think that every side and opinion has its fair share of criticism – as if everything in the world was wrong. Everything can be criticized, discredited, falsified, and undercut. And yet that is not a very helpful perspective, either – it just means that under the eyes of someone else, what we do is potentially “wrong.”

It’s difficult to step back and reevaluate why we believe what we believe, and why others believe what they believe, without necessarily demonizing or making a mortal enemy out of an “opposing camp.” And yet, it’s not possible to agree with everything – yet it is a necessity to respect those whom with we disagree.

This is the paradox of individualism and unity. The economic theory of comparative advantage requires us to work together to compose a society that runs for the benefit of all of its members; yet, individualism implores each of us to scrutinize the world under our own lenses, and to form our own opinions that may oppose the opinions of others.

Despite all of these concerns clouding my mind, somehow I can feel happy and accomplished. Somehow, I take pride in my accomplishments and rejoice in the successes of others. Somehow, I can just be myself and not worry.