[{"content":" Full paper: seam_carving.pdf · Code: YuXiangLo/NTUPDP2026\nJoint work with Ya-Chen Wu and Yu-Hsiang Lo, NTU Parallel Programming, Spring 2026.\nThe Starting Point Seam carving is a content-aware image resizing algorithm. Instead of cropping or scaling, it removes the least important pixel paths — seams — from the image, leaving the interesting parts intact. The results look surprisingly good.\nThe algorithm has two heavy stages. First, compute an energy map (gradient magnitude, one pass over the image). Then, run a dynamic program (DP) to find the lowest-energy connected path from top to bottom. Remove the seam. Repeat.\nOn a CPU, this is slow for large images. A single 8K frame (7680×4320, ~33 million pixels) can take hundreds of milliseconds per seam on one core. GPUs exist. We have a V100. The energy stage is embarrassingly parallel. The question seemed obvious: how much faster can we make this?\nWe expected to find a novel optimization and land a big number. We ended up proving why a big number is structurally impossible — and that turned out to be the more interesting result.\nThe Problem With the DP Profiling the pipeline on a 1428×968 image with Nsight Compute reveals the problem immediately: the DP kernel consumes ~93% of per-seam wall time. Everything else is noise.\nThe DP is:\n1 M[i][j] = e[i][j] + min(M[i-1][j-1], M[i-1][j], M[i-1][j+1]) Each row depends on the row above it. All H rows are serial. Within each row, the W columns are parallel — but you cannot start row i until row i-1 is complete, and no reordering eliminates this. This is the wall we kept running into.\nThe Optimization Path We built five progressively better kernels, guided by Nsight at each step.\nNaive launches one CUDA kernel per row. That\u0026rsquo;s H−1 kernel launches per seam, with a full global-memory round-trip each time. At 4K (H=2160), that\u0026rsquo;s 2159 kernel launches for one seam. Overhead dominates on small images; global memory traffic dominates on large ones.\nFused collapses all H rows into a single kernel using shared-memory ping-pong buffers. One __syncthreads() between rows. One kernel launch replaces 2159. The cost array never hits DRAM between rows — it lives in the 96 KB shared memory budget. This is the right structure; the question is how fast we can make it.\nPrefetch came from reading Nsight output, not intuition. The dominant stall on Fused was long_scoreboard — every warp was waiting ~200 cycles for the global load of the current energy row e[i][·]. The fix is software pipelining: while computing row i (which only reads shared-memory cost data), each thread issues the global load for e[i+1][·] into a register. By the time __syncthreads() completes, the load has resolved. Arithmetic and memory transfer overlap. This gave the largest single jump: 1.67× over Fused at the kernel level.\nBytePtr noticed that back-pointers — one of {-1, 0, +1} — were stored as 4-byte int. Switching to signed char cuts back-store traffic 4× at no cost in expressiveness. At 4K (3840×2160), the back-pointer array shrinks from 32 MB to 8 MB per seam pass.\nTiled is the piece that finally activated the other 79 SMs we\u0026rsquo;d been leaving idle. The idea: divide the W columns into K=60 strips (one per SM). Each strip gets T halo columns on each side. Within a tile of T rows, one block works entirely in shared memory, only writing its center strip to global memory at the tile boundary.\nThe correctness argument is a seam-drift bound: within T rows, a seam shifts at most T columns. Any error stays inside the halo. The center is bit-exact.\nThe result: kernel launches drop from H to ⌈H/T⌉ (68 at 8K with T=64), and all 60 SMs are active simultaneously. Output is bit-identical to the single-block designs.\nThe Numbers At 8K (7680×4320), Tiled reaches 5.73 ms/seam — the best exact result across all our designs.\nMethod ctrl 1080p 2K 4K 6K 8K CPU 1 core 5.00 21.6 37.5 83.6 212 325 CPU best (32-thread) 1.14 3.59 5.63 9.88 25.6 35.4 Naive 1.37 3.10 3.77 5.92 10.3 13.9 Tiled (ours) 0.314 0.917 1.25 2.16 4.16 5.73 vs CPU best 3.6× 3.9× 4.5× 4.6× 6.2× 6.2× 57× over a single CPU core. 6.2× over a tuned 32-thread NUMA-aware OpenMP baseline. That last number is the ceiling — and the reason it\u0026rsquo;s a ceiling is the interesting part.\nThree Reasons the Ceiling Is Structural At this point, a reasonable question is: are we just not trying hard enough? We spent a fair amount of time asking the same question. The answer is no, and we can show it three ways.\n1. The DP critical path is irreducible.\nThe dependency graph has a critical path of length H — one node per row. Any schedule needs at least H sequential min-steps. A GPU can parallelize within each row (the W columns), but it cannot reduce the H serial steps. The best possible per-row parallelism is already what Tiled achieves.\n2. A transpose cannot help.\nA natural thought: rotate the image, run the DP in the other direction. But a transpose is a bijection on storage addresses — it relabels storage without changing the dependency graph. The critical path stays H. If you transpose and then carve, you\u0026rsquo;re computing horizontal seams (depth W instead of H), which are just the orthogonal task — not a faster version of the original one. We confirm this empirically: the transposed direction is slower, because W \u0026gt; H for wide images.\n3. Removing the DP entirely gives no speedup.\nThis was the most surprising result. We implemented a greedy non-cumulative selection — no cost matrix, no serial dependency, fully parallel. It matches Tiled within 3% at every resolution. Why? Because the per-seam pipeline, not the DP, sets the end-to-end latency. At 8K, 5.73 ms splits into ~4.3 ms DP and ~1.4 ms preprocessing. The preprocessing floor (energy compute + seam removal) requires reading every pixel\u0026rsquo;s energy and writing the output — that\u0026rsquo;s ~398 MB at 8K, and against the V100\u0026rsquo;s ~900 GB/s bandwidth, it costs ~1.4 ms regardless of what selection method you use. Greedy has no DP, but it still pays the same bandwidth floor. It can\u0026rsquo;t win.\nThe One Lever That Works There is one escape: change the problem.\nApproximate batch removal amortizes one DP pass over K=60 seams. One forward pass produces 60 strip-local seam candidates; a single kernel removes all 60 in one image-wide read/write pass. This changes the per-seam cost from Θ(HW) per seam to Θ(HW/K).\nThe result: 0.103 ms/seam at 8K — 56× over the exact single-seam pipeline. The trade-off is strip-local, not global, optimality: the seams found are each the best within their strip, not provably the global minimum.\nThis is the batch mode that production pipelines actually use. The exact per-seam formulation isn\u0026rsquo;t the important case; we study it because it\u0026rsquo;s the case where the structural limits are cleanest to analyze.\nWhat We Actually Learned We went in expecting to beat prior GPU work with a novel technique. We did build the strongest exact pipeline we could find — Tiled is faster than Kim et al.\u0026rsquo;s exact single-block DP at every resolution, and Nsight-guided latency hiding made an approach they abandoned competitive again. But 6.2× over a tuned CPU is not the 76× headline their approximate method achieves.\nThe more valuable output is understanding why. The DP\u0026rsquo;s critical path is irreducible. A transpose doesn\u0026rsquo;t help. Removing the DP doesn\u0026rsquo;t help. The bandwidth floor exists regardless of algorithm. These aren\u0026rsquo;t failures of the implementation — they\u0026rsquo;re properties of the problem.\nSometimes the most useful result is a tight characterization of the achievable frontier, with impossibility arguments that explain why no further engineering will move it. We spent more time proving that the ceiling is real than we did hitting it, and I think that\u0026rsquo;s the right way to read the paper.\nThe one large lever is amortization. If you\u0026rsquo;re willing to trade global optimality for strip-local optimality, batch mode gives you 56×. If you need exact results, 6.2× over a strong multicore CPU is close to what\u0026rsquo;s achievable on a V100 — and the gap between that and the bandwidth floor (~1.3×) tells you there isn\u0026rsquo;t much left.\n","permalink":"https://lien0214.github.io/en/posts/seam-carving-gpu/","summary":"\u003cblockquote\u003e\n\u003cp\u003eFull paper: \u003ca href=\"https://github.com/YuXiangLo/NTUPDP2026/blob/main/paper/seam_carving.pdf\"\u003eseam_carving.pdf\u003c/a\u003e · Code: \u003ca href=\"https://github.com/YuXiangLo/NTUPDP2026\"\u003eYuXiangLo/NTUPDP2026\u003c/a\u003e\u003cbr\u003e\nJoint work with Ya-Chen Wu and Yu-Hsiang Lo, NTU Parallel Programming, Spring 2026.\u003c/p\u003e\n\u003c/blockquote\u003e\n\u003ch2 id=\"the-starting-point\"\u003eThe Starting Point\u003c/h2\u003e\n\u003cp\u003eSeam carving is a content-aware image resizing algorithm. Instead of cropping or scaling, it removes the \u003cem\u003eleast important\u003c/em\u003e pixel paths — seams — from the image, leaving the interesting parts intact. The results look surprisingly good.\u003c/p\u003e\n\u003cp\u003eThe algorithm has two heavy stages. First, compute an energy map (gradient magnitude, one pass over the image). Then, run a dynamic program (DP) to find the lowest-energy connected path from top to bottom. Remove the seam. Repeat.\u003c/p\u003e","title":"We Tried to Make Seam Carving Fast on a GPU. Then We Proved Why It Can't Be."},{"content":"The Problem When a language model has been fine-tuned to believe X, but its pretraining says Y — which wins? And more importantly, can you steer that behavior without retraining?\nWhat We Built Cross-format data augmentation pipeline for ConflictQA and ConflictBank. Generates semantically equivalent question variants to stress-test knowledge conflict handling across paraphrase and format shifts — essential for building benchmarks that don\u0026rsquo;t overfit to surface form.\nActivation steering experiments: extracted fine-tuned behavioral vectors from LoRA-trained models and applied them to frozen base models. The behavioral direction is surprisingly transferable — a steering vector from a LoRA-trained model can push a base model\u0026rsquo;s completions in the expected direction with zero weight updates.\nKey Finding Fine-tuning injects directional bias into the residual stream in a way that\u0026rsquo;s partially separable from task knowledge. If you can steer behavior without weights, you can also potentially undo fine-tuning signals. This has direct implications for alignment work.\n","permalink":"https://lien0214.github.io/en/projects/mem-gen/","summary":"\u003ch2 id=\"the-problem\"\u003eThe Problem\u003c/h2\u003e\n\u003cp\u003eWhen a language model has been fine-tuned to believe X, but its pretraining says Y — which wins? And more importantly, can you steer that behavior without retraining?\u003c/p\u003e\n\u003ch2 id=\"what-we-built\"\u003eWhat We Built\u003c/h2\u003e\n\u003cp\u003e\u003cstrong\u003eCross-format data augmentation pipeline\u003c/strong\u003e for ConflictQA and ConflictBank. Generates semantically equivalent question variants to stress-test knowledge conflict handling across paraphrase and format shifts — essential for building benchmarks that don\u0026rsquo;t overfit to surface form.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eActivation steering experiments\u003c/strong\u003e: extracted fine-tuned behavioral vectors from LoRA-trained models and applied them to frozen base models. The behavioral direction is surprisingly transferable — a steering vector from a LoRA-trained model can push a base model\u0026rsquo;s completions in the expected direction with zero weight updates.\u003c/p\u003e","title":"ICC Mem/Gen"},{"content":"The System A production-grade trading system built on FinLab\u0026rsquo;s real-time financial data. Covers the full pipeline from strategy development to automated execution.\nStrategy layer: Momentum-based and mean-reversion signals on daily OHLCV data. The TWSE is a retail-dominated market with different microstructure than US equities — many patterns from Western quant literature don\u0026rsquo;t transfer cleanly, so we research Taiwan-specific signals.\nExecution: Automated via crontab. Three cronjobs handle account data fetching, independent backtesting reports, and order execution via Yuanta Fugle API and Sinopac Shioaji API. Multi-user and multi-broker support configured through config.yaml.\nDashboard: Web UI (Dash + Bootstrap + Gunicorn) showing trade info, account balance charts, and monthly return visualizations. Deployed with Docker Compose.\nCurrently: Evaluating ML-based signal combination. Live paper-trading is running.\nStack FinLab · shioaji · ta-lib · pandas · Dash · Flask · Docker\n9 stars on GitHub · v3.1.0 released Jan 2026\n","permalink":"https://lien0214.github.io/en/projects/stock-analysis/","summary":"\u003ch2 id=\"the-system\"\u003eThe System\u003c/h2\u003e\n\u003cp\u003eA production-grade trading system built on FinLab\u0026rsquo;s real-time financial data. Covers the full pipeline from strategy development to automated execution.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eStrategy layer\u003c/strong\u003e: Momentum-based and mean-reversion signals on daily OHLCV data. The TWSE is a retail-dominated market with different microstructure than US equities — many patterns from Western quant literature don\u0026rsquo;t transfer cleanly, so we research Taiwan-specific signals.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eExecution\u003c/strong\u003e: Automated via crontab. Three cronjobs handle account data fetching, independent backtesting reports, and order execution via Yuanta Fugle API and Sinopac Shioaji API. Multi-user and multi-broker support configured through \u003ccode\u003econfig.yaml\u003c/code\u003e.\u003c/p\u003e","title":"Algorithm Trading — TWSE"},{"content":"Yesterday marks the end of a remarkable chapter for me at ShopBack, as I wrap up my six-month journey on Town Hall day — filled with gratitude, memories, and excitement for what\u0026rsquo;s ahead.\nOver the past half year as a Software Engineer Intern in the Backend Core-Experience Team, I\u0026rsquo;ve had the chance to work across a variety of impactful projects — from Customer and Watchlist Service, to Notification Service, and finally the Travel project. It\u0026rsquo;s been an incredible experience contributing to a high-traffic, real-world system and making my first tangible impact as an engineer.\nI started figuring things out, sometimes struggling, but always learning and having fun. Even though I\u0026rsquo;ve only scratched the surface of large-scale system design, I feel grateful to have been part of such an enthusiastic, collaborative team.\nThrough these experiences, I deepened my approach to problem solving — especially leveraging AI tools to go end-to-end from analysis and spec reviews to implementation and reporting.\nI\u0026rsquo;m thankful to George Lee, my manager, for his wisdom and for showing me what it means to \u0026ldquo;work smarter.\u0026rdquo; I also want to thank Shih-Yuan Chen and 林一豪 for their mentorship in Customer Service, Hao-Ping Shih for his support in Notification Service, and Nick Chen and Tim Pei for guiding me on the Travel project — and to everyone else whose guidance and collaboration helped shape my internship experience.\nWorking at ShopBack has been both inspiring and exciting — being surrounded by a company that\u0026rsquo;s growing fast and a culture that encourages ownership and curiosity.\nThough this chapter closes, I\u0026rsquo;m taking all the lessons, friendships, and energy forward into what\u0026rsquo;s next.\nOnward and upward.\n","permalink":"https://lien0214.github.io/en/posts/leaving-shopback/","summary":"\u003cp\u003eYesterday marks the end of a remarkable chapter for me at ShopBack, as I wrap up my six-month journey on Town Hall day — filled with gratitude, memories, and excitement for what\u0026rsquo;s ahead.\u003c/p\u003e\n\u003cp\u003eOver the past half year as a Software Engineer Intern in the Backend Core-Experience Team, I\u0026rsquo;ve had the chance to work across a variety of impactful projects — from Customer and Watchlist Service, to Notification Service, and finally the Travel project. It\u0026rsquo;s been an incredible experience contributing to a high-traffic, real-world system and making my first tangible impact as an engineer.\u003c/p\u003e","title":"Six Months at ShopBack: Wrapping Up My Backend Internship"},{"content":"The Context Shopback\u0026rsquo;s backend runs on a set of microservices — each service owns its domain, communicates over the network, and scales independently. This architecture is well-understood and has obvious benefits. It also has a cost: idle containers still occupy allocated resources. When a service is running but not handling traffic, you\u0026rsquo;re paying for it anyway.\nThe decision to migrate toward a monorepo structure was partly about this. With a TypeScript monorepo, multiple services can share the same process and resource pool. The idle overhead shrinks. The machine bill shrinks with it.\nMy subtask was specific: where two services currently communicated via API calls across the network, migrate them so they call each other as direct function invocations within the monorepo. On paper, this sounds like a refactor. In practice, it turned into the hardest thing I worked on during the internship.\nWhy It Was Hard The first layer of difficulty was technical. Shopback\u0026rsquo;s TypeScript monorepo structure had been designed primarily for libraries and shared packages — the common use case for monorepos. Running actual backend services inside that structure is a different problem. Services have their own startup lifecycle, their own framework dependencies, their own runtime concerns. Libraries don\u0026rsquo;t.\nSome engineers on the team had already implemented parts of the monorepo, but for library use cases. Their code existed, I could read it, but it didn\u0026rsquo;t map cleanly to what I needed to do. The design patterns they used made assumptions that didn\u0026rsquo;t hold for running services — specifically around how different backend frameworks (we were using multiple, across different services) interacted with the monorepo\u0026rsquo;s dependency resolution.\nThere were also version constraint issues. TypeScript monorepos manage package versions at the workspace level, and certain inter-service dependencies had version assumptions baked in that created conflicts when you tried to run them together. It wasn\u0026rsquo;t a bug you could grep for. It was the kind of thing you find by running the code and watching it fail in ways that don\u0026rsquo;t have obvious stack traces.\nThe Mentor Boundary The second layer of difficulty was that my mentor hadn\u0026rsquo;t done this before either.\nThis isn\u0026rsquo;t a criticism — it\u0026rsquo;s just the reality of working on something that hadn\u0026rsquo;t been fully attempted in this specific configuration. He had his own work, and he was helpful in the ways he could be. But for the specific migration problem I was solving, he was starting from the same place I was.\nSo I was working independently on a technical problem with no clear prior art on my immediate team, in a codebase that was large enough that understanding the relevant parts took real time, with a framework setup that wasn\u0026rsquo;t well-documented internally.\nFor two or three sprints, I made slow progress. I could see the shape of what needed to happen. Getting the implementation to actually work was a different matter.\nThe Slack Message Eventually I did something I had been hesitant to do: I cold-messaged a senior engineer on another team who I knew had touched adjacent parts of the codebase.\nI don\u0026rsquo;t know why I waited as long as I did. Some combination of not wanting to bother someone I\u0026rsquo;d never worked with directly, and an assumption that I should be able to figure it out on my own. Neither of these was a good reason to stay stuck.\nHe replied. He walked me through the approach he\u0026rsquo;d used for a related problem. I looked at his code. The pieces clicked into place, and I was able to finish the migration within the next sprint.\nThree or four sprints total. A significant part of the internship.\nWhat I Carried Away The technical lesson is real but not the important one. Yes, TypeScript monorepos have constraints when you\u0026rsquo;re running services rather than just sharing packages. Yes, framework version boundaries matter. I learned those things concretely. They\u0026rsquo;ll be useful.\nThe more durable lesson is about how to work when you\u0026rsquo;re stuck.\nI had bounded my resources. I was looking at the team directly around me, the internal documentation I could find, and the code I could read. None of that was enough. The answer was one Slack message away from someone I\u0026rsquo;d never spoken to.\nLooking back from now — mid-2026, almost a year later — I can see that I let that problem sit in my head longer than it needed to because I was reluctant to define \u0026ldquo;asking for help\u0026rdquo; broadly enough. The senior engineer on another team wasn\u0026rsquo;t my mentor. He wasn\u0026rsquo;t assigned to help me. But he was the person who had the knowledge I needed, and most people are willing to share knowledge when you ask clearly.\nDon\u0026rsquo;t limit your help-seeking to the people who are nominally responsible for helping you. The right person might be somewhere else in the company. Or on another team. Or, sometimes, a stranger you find a way to reach.\nThe constraint was in my head, not in the org chart.\n","permalink":"https://lien0214.github.io/en/posts/shopback-monorepo/","summary":"\u003ch2 id=\"the-context\"\u003eThe Context\u003c/h2\u003e\n\u003cp\u003eShopback\u0026rsquo;s backend runs on a set of microservices — each service owns its domain, communicates over the network, and scales independently. This architecture is well-understood and has obvious benefits. It also has a cost: idle containers still occupy allocated resources. When a service is running but not handling traffic, you\u0026rsquo;re paying for it anyway.\u003c/p\u003e\n\u003cp\u003eThe decision to migrate toward a monorepo structure was partly about this. With a TypeScript monorepo, multiple services can share the same process and resource pool. The idle overhead shrinks. The machine bill shrinks with it.\u003c/p\u003e","title":"The Migration Nobody on My Team Had Done Before"},{"content":"There was a feature I worked on at Shopback that I almost didn\u0026rsquo;t think twice about.\nThe ask was simple: the PM had a design for a selection panel inside the app — a small UI element where users choose from a list. The design called for adding merchant icons to each entry. My part was coordinating with the frontend team to make sure the integration was wired up correctly on the backend side. It wasn\u0026rsquo;t technically involved. It was cooperation work: PM had a vision, design had an artifact, and I helped connect the pieces so it could ship.\nThat was it.\nAfter it deployed, a stakeholder shared the numbers. The entry rate for that panel — the percentage of users who saw the page and actually clicked into the selection panel — had moved from roughly 10% to over 40%. A 30%+ lift.\nI remember reading that and pausing.\nThe Implicit Model I Had Been Carrying Before this, I\u0026rsquo;d had a quiet assumption about how impact works in software: it scales with difficulty. The harder the problem, the more it matters. System migrations, performance refactors, non-obvious algorithms — those are the real work. A UI tweak? Icons on a list? That\u0026rsquo;s almost not engineering.\nThe data disagreed.\nThat panel change touched a significant number of user sessions. It changed what people did when they opened the app. Meanwhile, some of the work I\u0026rsquo;d spent much more time on — cross-service API migrations, error rate reductions — was genuinely important, but its user-visible effect was quieter. Infrastructure work often is.\nThe icons were one afternoon of coordination and a 30% behavioral shift.\nWhy Good Design Is Real Work The thing the icons actually did was give the panel identity. Before them, the entries were probably just text or a generic list. After, each row had a visual anchor — something that says \u0026ldquo;this is a real thing, a recognizable thing, worth your attention.\u0026rdquo; Users responded to that signal.\nThis isn\u0026rsquo;t a new idea. But there\u0026rsquo;s a difference between knowing that design matters and watching it matter on a metric you can point to.\nWhat I updated was the scope of \u0026ldquo;engineering contribution.\u0026rdquo; Getting the backend integration right so the icons loaded correctly, at the right time, without breaking the panel\u0026rsquo;s existing behavior — that\u0026rsquo;s a real contribution. It\u0026rsquo;s not glamorous. But it was the thing that made the PM\u0026rsquo;s vision shippable, and the shippable thing was the thing that moved the number.\nOn Pride Something I didn\u0026rsquo;t fully expect was how it felt when the numbers came in.\nThe pride wasn\u0026rsquo;t about technical difficulty. It was about effect. Something I touched changed what a large number of people did when they opened the app. That\u0026rsquo;s a different kind of satisfaction from solving a hard problem cleanly — less intellectual, more connected to actual people.\nI think this is the version of pride worth chasing in software. Not \u0026ldquo;I solved something hard\u0026rdquo; but \u0026ldquo;something I built is now part of how people move through the world.\u0026rdquo; Even if the movement is small — tapping a different button, seeing something they didn\u0026rsquo;t see before.\nMaking impact doesn\u0026rsquo;t always look like the hardest problem in the room. Sometimes it looks like a row of icons.\n","permalink":"https://lien0214.github.io/en/posts/shopback-impact/","summary":"\u003cp\u003eThere was a feature I worked on at Shopback that I almost didn\u0026rsquo;t think twice about.\u003c/p\u003e\n\u003cp\u003eThe ask was simple: the PM had a design for a selection panel inside the app — a small UI element where users choose from a list. The design called for adding merchant icons to each entry. My part was coordinating with the frontend team to make sure the integration was wired up correctly on the backend side. It wasn\u0026rsquo;t technically involved. It was cooperation work: PM had a vision, design had an artifact, and I helped connect the pieces so it could ship.\u003c/p\u003e","title":"The Small Feature That Changed How I Think About Impact"},{"content":"The Setup April 2025. Six people, one semester, and a goal: ship a complete 2D platformer in Unity.\nThe game is called Bubblo. You play as a bubble creature exploring a pixel-art Wonderland, rescuing villagers trapped in cages, and fighting off needle-type enemies — bees, jumping spiders, a unicorn that very much wants to pop you. The core mechanic is bubble physics: bouncing and floating through levels, using your softness as both a movement tool and a combat advantage.\nI was the project lead. My job was to translate a loose set of ideas into an actual, shippable game — and to keep five other people moving in the same direction for three months without anyone quitting.\nWhat I Brought from Cmoney A few months before this, I was wrapping up my internship at Cmoney as a backend engineer. Nine months of real agile: sprint planning, daily standups, sprint reviews, retrospectives. I\u0026rsquo;d sat in enough of those meetings to have a surface-level understanding of what they were for.\nSo going into Bubblo, I had a framework. We\u0026rsquo;d run two-week sprints. We\u0026rsquo;d have a proper task board. We\u0026rsquo;d do retrospectives. I knew the vocabulary.\nWhat I didn\u0026rsquo;t fully understand yet was that the vocabulary came with assumptions baked in — assumptions about how people structure their time, what \u0026ldquo;availability\u0026rdquo; means, and what a team\u0026rsquo;s primary obligation is.\nAt a company, everyone\u0026rsquo;s primary job is the project. The sprint is the container. Agile ceremonies work because they\u0026rsquo;re designed around full-time focus.\nAt school, that\u0026rsquo;s not true for anyone.\nWhy the Ceremonies Didn\u0026rsquo;t Work Our team had lab obligations, other coursework, extracurriculars, internship applications. One person was juggling a research paper deadline midway through our sprint. Another had a part-time job. Everyone was a full person with a full life outside of Bubblo.\nRunning a proper sprint planning session meant asking people to block two hours they didn\u0026rsquo;t have. Running retrospectives felt performative when the real blocker was simply that people were busy and felt guilty about it. Daily standups became anxiety triggers instead of sync mechanisms.\nThe rigid structure was creating friction instead of removing it.\nDesigning Something That Actually Fit So I stripped it down.\nThe task board stayed — that was the most valuable thing. A shared view of what existed, what was in progress, and what was blocked. It cost everyone zero ceremony to use asynchronously.\nI replaced scheduled standups with async check-ins: a short message at the start of each work session, no format required, just \u0026ldquo;working on X today, blocked by Y if anything.\u0026rdquo; Low pressure. Actually read.\nThe weekly sync became one meeting with a single agenda: what\u0026rsquo;s blocking someone right now that I can unblock in the next 30 minutes? Not status updates. Not progress reports. Just blockers. It ran 20-30 minutes and ended when we ran out of blockers.\nThe underlying principle was: respect that everyone\u0026rsquo;s primary obligation isn\u0026rsquo;t this project, but make it easy to contribute when they do have time. Compact, not loose.\nThe OOP Refactor Nobody Wanted (But We Did Anyway) About six weeks in, we had a problem. The avatar behaviour code had grown organically and was becoming a tangle. Multiple people were touching it, and changes in one place were breaking things in another in ways that were hard to trace.\nWe ran a refactor. Mid-sprint. On a student timeline.\nBefore any game code was written, I had designed the component architecture around Observer, State Machine, and Command patterns — mostly on paper. The refactor was about actually enforcing them in the implementation. Making the state transitions explicit. Making the event system the single source of truth for cross-component communication.\nIt was the right call. After the refactor, adding a new enemy type or a new player state became a contained operation. The last third of the sprint was noticeably less chaotic than the first two-thirds.\nThe timing felt wrong, but the alternative — continuing to patch a tangled codebase — would have been worse. Some technical debt compounds fast enough that you have to pay it early.\nThe AI Context We used ChatGPT throughout the project. This was before any of the IDE agent tooling existed — no Copilot Agents, no integrated code generation beyond autocomplete. Mostly it was: describe a problem, get a sketch, adapt the sketch.\nLooking back a year later, the tooling has changed completely. A project like this today would look different — faster iterations, less time on boilerplate, different kinds of mistakes. I\u0026rsquo;m curious what gets harder as the tools improve, not just what gets easier. Probably the architecture decisions — the ones that don\u0026rsquo;t have a clear right answer and require the team to commit to a direction and live with it.\nWhat I Actually Learned The game shipped. 600+ commits, five playable levels, full audio, 60fps WebGL. We were proud of it.\nBut the thing I carry forward isn\u0026rsquo;t the Unity knowledge or even the design patterns — it\u0026rsquo;s the calibration on workflow design.\nAgile isn\u0026rsquo;t a formula. It\u0026rsquo;s a set of principles built on specific assumptions about context. When the context changes — and student teams are a very different context than a company — the principles still apply, but the ceremonies have to be redesigned from scratch. The question is always: what is this ritual actually trying to achieve, and is this the lightest way to achieve it?\nThe other thing: project management in a peer team is mostly emotional work. Keeping people unblocked isn\u0026rsquo;t just about task coordination — it\u0026rsquo;s about making sure nobody feels like their contribution doesn\u0026rsquo;t matter, or that the gap between \u0026ldquo;what I said I\u0026rsquo;d do\u0026rdquo; and \u0026ldquo;what I actually did\u0026rdquo; is going to be held against them. That kind of safety is what lets people show up when they have time, instead of avoiding the project because it feels like a source of guilt.\nI didn\u0026rsquo;t fully understand that going in. I do now.\n","permalink":"https://lien0214.github.io/en/posts/bubblo-devlog/","summary":"\u003ch2 id=\"the-setup\"\u003eThe Setup\u003c/h2\u003e\n\u003cp\u003eApril 2025. Six people, one semester, and a goal: ship a complete 2D platformer in Unity.\u003c/p\u003e\n\u003cp\u003eThe game is called Bubblo. You play as a bubble creature exploring a pixel-art Wonderland, rescuing villagers trapped in cages, and fighting off needle-type enemies — bees, jumping spiders, a unicorn that very much wants to pop you. The core mechanic is bubble physics: bouncing and floating through levels, using your softness as both a movement tool and a combat advantage.\u003c/p\u003e","title":"Leading a Game Project in a Semester: What Agile Doesn't Prepare You For"},{"content":"The End of the First One April 20th. My last day at CMoney after nine months as a Backend Development Engineer Intern.\nFirst internships are strange. You don\u0026rsquo;t know what you don\u0026rsquo;t know, so you can\u0026rsquo;t tell what\u0026rsquo;s normal and what\u0026rsquo;s specific to the place you landed. Looking back now, I think I got lucky in the specific way that matters: I landed somewhere with real code, real scale, and a mentor who took the teaching part seriously.\nWhat CMoney Is CMoney is Taiwan\u0026rsquo;s leading financial platform — 7 million app downloads, core infrastructure for stock market data, portfolio tracking, and financial analysis tools used by retail investors across Taiwan. The backend is not a startup\u0026rsquo;s clean-slate codebase. It\u0026rsquo;s a decade of production history, accumulated decisions, and inherited constraints. When you work there, you feel the weight of that.\nI joined during an active microservice refactor. The goal was to extract business logic out of a monolith and into versioned service contracts. My job was to implement those refactors across nine backend APIs in the flagship product.\nLearning OOP the Hard Way I had learned object-oriented programming from textbooks. CMoney taught me what OOP actually buys you.\nThe codebase I inherited was written without clear separation of concerns. Business logic sat directly in controllers. Data access was scattered. State leaked between layers. The tests were green — the code was unmaintainable.\nRefactoring it meant understanding why it was written that way before I could improve it. And understanding that required me to first understand ASP.NET Core, the C# type system, and the patterns that were supposed to be there but weren\u0026rsquo;t.\nTom — my mentor — was patient with this in a way I didn\u0026rsquo;t fully appreciate until later. He didn\u0026rsquo;t just tell me what to change. He explained why the original design caused problems, what the intended pattern was, and how to migrate toward it without breaking the 7 million users on the other end. That\u0026rsquo;s a different skill than knowing the patterns. It\u0026rsquo;s knowing when they matter.\nBy month three I had stopped asking \u0026ldquo;what should I do here\u0026rdquo; and started asking \u0026ldquo;what does this code assume, and is that assumption still valid.\u0026rdquo; That shift was the real education.\nFinancial APIs at Scale The domain had its own challenges. Financial data at market open is not normal traffic. Millions of users simultaneously refreshing portfolios, running screeners, checking live prices. Concurrency edge cases that only appear at that scale and only at that time window. You learn quickly that \u0026ldquo;it worked in testing\u0026rdquo; and \u0026ldquo;it works under real load\u0026rdquo; are different claims.\nI got deeply comfortable with high-concurrency patterns in C# — async/await, connection pooling, caching strategy for time-sensitive data — because the alternative was an incident at 9:00 AM on a trading day.\nThe Last Internship Before the Shift I wrapped up at CMoney in April 2025. Two or three months later, something in the industry started moving fast.\nAgentic programming — where an AI agent handles end-to-end implementation from spec to code to test — went from a demo to a real workflow used by real engineers. The shape of \u0026ldquo;junior dev work\u0026rdquo; started changing in real time.\nI don\u0026rsquo;t know exactly what this means for early-career engineering yet. But I do know that the nine months I spent manually inheriting broken production code, tracing bugs through three service layers at 9 AM, and arguing with my mentor about whether this abstraction was worth the indirection — that kind of learning doesn\u0026rsquo;t get abstracted away by an agent. Understanding what the code is doing and whether the design is sound is still the hard part.\nMaybe what changed is that the mechanical parts get faster. The judgment part doesn\u0026rsquo;t.\nThank You To Tom, for teaching me ASP.NET, C#, and OOP the way it should be taught — through real code with real consequences. For having the patience to explain the why at every step.\nTo the team, for the massive amount of work that ran through CMoney\u0026rsquo;s history before I arrived, and for letting me contribute a small piece of it.\nThis chapter\u0026rsquo;s done. Onwards.\n","permalink":"https://lien0214.github.io/en/posts/leaving-cmoney/","summary":"\u003ch2 id=\"the-end-of-the-first-one\"\u003eThe End of the First One\u003c/h2\u003e\n\u003cp\u003eApril 20th. My last day at CMoney after nine months as a Backend Development Engineer Intern.\u003c/p\u003e\n\u003cp\u003eFirst internships are strange. You don\u0026rsquo;t know what you don\u0026rsquo;t know, so you can\u0026rsquo;t tell what\u0026rsquo;s normal and what\u0026rsquo;s specific to the place you landed. Looking back now, I think I got lucky in the specific way that matters: I landed somewhere with \u003cem\u003ereal\u003c/em\u003e code, \u003cem\u003ereal\u003c/em\u003e scale, and a mentor who took the teaching part seriously.\u003c/p\u003e","title":"My First Internship at CMoney — and Why It Might Have Been the Last of Its Kind"},{"content":"\nWhat It Is A 2D pixel platformer inspired by Mario and Kirby. You play as Bubblo — a bubble creature exploring a Wonderland, rescuing villagers trapped in cages, and fighting off a cast of needle-type enemies: bees, jumping spiders, and a unicorn that very much wants to pop you.\nThe core mechanic is bubble physics. You bounce, float, and use your bubble properties both for traversal and combat. Five levels, stable 60fps, a full game loop with lives and scoring.\nMy Role I was project lead — which in practice meant translating loose PM-level ideas into actual, shippable game details, then keeping six people moving in the same direction for three months.\nBefore a single line of game code was written, I designed the component architecture. We used Observer, State Machine, and Command patterns deliberately — not because it felt clever, but because with six people in the same codebase, implicit coupling kills velocity fast. Halfway through the sprint we ran a full refactor of the avatar behaviour system specifically to enforce these patterns more cleanly.\nThe workflow side was its own challenge. I came in with a surface-level understanding of agile from my Cmoney internship — standups, sprint planning, retrospectives. None of that transferred cleanly to a student team. People have lab work, other coursework, clubs, research. The rigid ceremony doesn\u0026rsquo;t fit. So I stripped it down to something lighter: a shared task board, short async check-ins, and one weekly sync where the only goal was unblocking people.\nWe used ChatGPT throughout — no IDE agents existed yet. Looking back it\u0026rsquo;s interesting how much the workflow has changed in a year.\nWhat Shipped 600+ commits across the team. Five playable levels with distinct enemy patterns. Full audio. Win/lose states. A polished title screen. The demo runs at stable 60fps in WebGL.\nPlay it in your browser →\n","permalink":"https://lien0214.github.io/en/projects/bubblo/","summary":"\u003cp\u003e\u003cimg alt=\"Bubblo title screen\" loading=\"lazy\" src=\"/images/projects/bubblo-cover.png\"\u003e\u003c/p\u003e\n\u003ch2 id=\"what-it-is\"\u003eWhat It Is\u003c/h2\u003e\n\u003cp\u003eA 2D pixel platformer inspired by Mario and Kirby. You play as Bubblo — a bubble creature exploring a Wonderland, rescuing villagers trapped in cages, and fighting off a cast of needle-type enemies: bees, jumping spiders, and a unicorn that very much wants to pop you.\u003c/p\u003e\n\u003cp\u003eThe core mechanic is bubble physics. You bounce, float, and use your bubble properties both for traversal and combat. Five levels, stable 60fps, a full game loop with lives and scoring.\u003c/p\u003e","title":"Bubblo — 2D Platformer"},{"content":"The Task Bidirectional translation between emoji sequences and Traditional Mandarin Chinese — not dictionary lookup, but contextual understanding of how emojis carry meaning in Taiwan internet culture.\nDataset Built from scratch by crowdsourcing emoji-to-Mandarin mappings and applying rule-based augmentation. Key challenge: emoji meaning is deeply contextual and varies by platform, age group, and Taiwan-specific internet slang. Three dataset splits with different algorithmic/GPT-generated data ratios (20/80, 50/50, 80/20) to evaluate data composition effects.\nModels and Results Fine-tuned Taiwan-LLaMA and Google mT5 using LoRA with RLHF-style preference optimization.\nmT5 outperformed LLaMA on this task due to its explicit multilingual pretraining on Traditional Chinese text.\nMost interesting finding: models that scored well on BLEU performed noticeably worse on human evaluation for \u0026ldquo;feels natural.\u0026rdquo; A clean example of reward hacking — the metric diverges from the goal. We added METEOR evaluation and human annotation rounds to catch this gap.\n","permalink":"https://lien0214.github.io/en/projects/e-motchi/","summary":"\u003ch2 id=\"the-task\"\u003eThe Task\u003c/h2\u003e\n\u003cp\u003eBidirectional translation between emoji sequences and Traditional Mandarin Chinese — not dictionary lookup, but contextual understanding of how emojis carry meaning in Taiwan internet culture.\u003c/p\u003e\n\u003ch2 id=\"dataset\"\u003eDataset\u003c/h2\u003e\n\u003cp\u003eBuilt from scratch by crowdsourcing emoji-to-Mandarin mappings and applying rule-based augmentation. Key challenge: emoji meaning is deeply contextual and varies by platform, age group, and Taiwan-specific internet slang. Three dataset splits with different algorithmic/GPT-generated data ratios (20/80, 50/50, 80/20) to evaluate data composition effects.\u003c/p\u003e","title":"E-MoTchi — Emoji ↔ Mandarin NLP"},{"content":" MS CS, National Taiwan University (2025–2027). BS CS, National Taiwan University (2021–2025, graduated) I\u0026rsquo;m in the first year of my master\u0026rsquo;s at NTU, simultaneously working in two research labs that cover very different parts of the AI space:\nMIRLab (Multimedia Information Retrieval Lab) — where I\u0026rsquo;m doing quantitative trading research on TWSE MSLab (Machine Intelligence and Agentic System Lab) — where my thesis work focuses on knowledge conflicts and memory in large language models Before grad school, I spent about 18 months as a backend engineer intern at Shopback and CMoney. I also did a research assistant at Academia Sinica\u0026rsquo;s Citi TACC lab working on synthetic data generation for cybersecurity.\nResearch Interests Knowledge conflict in LLMs: what happens when parametric knowledge (from pretraining) conflicts with contextual knowledge (from fine-tuning or prompting), and how to resolve it — including via activation steering without weight updates RAG safety: robustness of retrieval-augmented systems against adversarial or conflicting context; how models decide when to trust retrieved content vs. parametric memory DL on quantitative trading: applying deep learning to systematic strategies on TWSE — sequence models for signal generation, learned feature combination, and cross-asset transfer Industry Interests AI Engineering for LLM: building production LLM systems — inference optimization, prompt pipelines, evaluation frameworks, and deployment at scale Infrastructure build-up: designing and scaling backend infrastructure for AI products — from serving layers to observability and reliability engineering Backend systems: distributed systems, high-concurrency APIs, microservice architecture — the engineering foundation that makes AI products actually work Education MS, Computer Science, National Taiwan University, 2025–2027 · Expected Graduation 2027 BS, Computer Science, National Taiwan University, 2021–2025 · Graduated Labs: MIRLab (Multimedia Information Retrieval Lab) · MSLab (Machine Intelligence and Agentic System Lab)\nExperience Timeline Period Role Company May 2025 – Nov 2025 Software Engineer Intern Shopback Jul 2024 – Apr 2025 Backend Dev Engineer Intern Cmoney Sep 2023 – Jun 2024 Research Assistant (UG) Academia Sinica, Citi Lab Feb 2023 – Jun 2023 Teaching Assistant, DSA NTU Dept. of CS Technical Profile Most fluent in: TypeScript, Python, C# Backend stack I know well: ASP.NET Core, Node.js, Kubernetes, Docker, MongoDB, PostgreSQL ML tools I use regularly: PyTorch, HuggingFace Transformers, PEFT/LoRA, Activation Steering Currently learning: Go, more CUDA-level attention mechanisms, TWSE market microstructure Contact ted20030214@gmail.com · GitHub · LinkedIn · (+886) 902-323-591\n","permalink":"https://lien0214.github.io/en/about/","summary":"about","title":"About"},{"content":"About Bubblo Bubblo is a complete 2D platformer game demo built in Unity over one semester by a 6-person team. I led the project as both the technical architect and project manager.\nWhat I actually built:\nDesigned the component architecture before any game code was written — Observer, State Machine, and Command patterns kept 6 people from stepping on each other Ran weekly standups and maintained the task board; most of the PM work was unblocking teammates The demo runs at stable 60fps with 5 playable levels, full audio, and a proper game loop (lives, scoring, win/lose states) Controls: Arrow keys or WASD to move, Space to jump. Click the fullscreen button for the best experience.\nGame may take 30–60 seconds to load depending on your connection — the WebGL build is ~80MB.\n","permalink":"https://lien0214.github.io/en/game/","summary":"\u003ch2 id=\"about-bubblo\"\u003eAbout Bubblo\u003c/h2\u003e\n\u003cp\u003eBubblo is a complete 2D platformer game demo built in Unity over one semester by a 6-person team. I led the project as both the technical architect and project manager.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eWhat I actually built:\u003c/strong\u003e\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eDesigned the component architecture before any game code was written — Observer, State Machine, and Command patterns kept 6 people from stepping on each other\u003c/li\u003e\n\u003cli\u003eRan weekly standups and maintained the task board; most of the PM work was unblocking teammates\u003c/li\u003e\n\u003cli\u003eThe demo runs at stable 60fps with 5 playable levels, full audio, and a proper game loop (lives, scoring, win/lose states)\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003e\u003cstrong\u003eControls:\u003c/strong\u003e Arrow keys or WASD to move, Space to jump. Click the fullscreen button for the best experience.\u003c/p\u003e","title":"Bubblo — 2D Platformer"},{"content":" Feel free to reach out for research discussions, collaboration, or just to say hi. Email ted20030214@gmail.com LinkedIn linkedin.com/in/yi-wei-lien GitHub github.com/lien0214 Phone (+886) 902-323-591 Send a message Title — Mr. Ms. Dr. Prof. Name Subject — Select a topic — Job opportunity Research collaboration General inquiry Other Company / School (optional) Message Please write a message first. Sign in with Google \u0026amp; Send ⏳ ✓ Message sent — I'll get back to you soon. Something went wrong. Email me directly at ted20030214@gmail.com. ","permalink":"https://lien0214.github.io/en/contact/","summary":"contact","title":"Contact"},{"content":" View CV Download PDF Education MS, Computer Science, National Taiwan University · 2025–2027 · Expected Graduation 2027 BS, Computer Science, National Taiwan University · 2021–2025 · Graduated Labs: MIRLab (Multimedia Information Retrieval Lab) · MSLab (Machine Intelligence and Agentic System Lab)\nWork Experience Software Engineer Intern · Shopback May 2025 – Nov 2025 · Taipei, Taiwan · Hybrid\nTypeScript Kubernetes Monorepo SQL CI/CD\nMigrated 10+ API calls to direct function calls under mono-repo migration, cutting each latency from ~100ms to under 10ms Developed features handling 1M+ API requests monthly, boosting user interest by 15–25% Resolved 5+ backend issues, reducing error rate by 10–15% for respective APIs Backend Development Engineer Intern · Cmoney Jul 2024 – Apr 2025 · New Taipei City, Taiwan · Hybrid\nC# ASP.NET OOP Microservice MongoDB High-Concurrency\nImplemented microservice refactors for 9 backend APIs in a flagship product (7M+ downloads) Research Assistant · Citi Lab, Academia Sinica Sep 2023 – Jun 2024 · Nangang, Taiwan · Remote\nPython Cybersecurity Synthetic Data Linux Audit Logs\nBuilt a data processing system for 200M+ Linux audit logs supporting GAN-based cybersecurity detection research Teaching Assistant · Dept. of CS, NTU Feb 2023 – Jun 2023 · Taipei, Taiwan · On-site\nData Structures Algorithms\nProjects ICC Mem/Gen · MSLab Mar 2026 – May 2026 · 4-person Team · 4-month Sprint\nKnowledge Conflict SFT PEFT LoRA Activation Steering\nDeveloped a cross-format data augmentation pipeline for ConflictQA/ConflictBank Used Activation Steering to prove that fine-tuned behavioral vectors can steer frozen base models Algorithm Trading Project · MIRLab Jan 2026 – May 2026 · 3-person Team · Ongoing\nQuant TWSE Daily Strategy\nBubblo · 2D Platformer Apr 2025 – Jun 2025 · 6-person Team · Project Lead · 600+ Commits\nUnity C# OOP Design Patterns\nDesigned the component architecture before any game code was written, using Observer, State Machine, and Command patterns explicitly Led weekly standups and maintained the task board across a 3-month sprint E-MoTchi · NLP Nov 2024 – Dec 2024 · 5-person Team · 40-day Sprint\nNLP Taiwan-LLaMA Google mT5 LoRA RLHF\nFine-tuned multilingual LLMs using LoRA with RLHF-style preference optimization for emoji-to-Mandarin translation Skills Most fluent in: TypeScript · Python · C#\nBackend: ASP.NET Core · Node.js · Kubernetes · Docker · MongoDB · PostgreSQL\nML/AI: PyTorch · HuggingFace Transformers · PEFT/LoRA · Activation Steering\nSoft Skills: Agile workflows · Project management · Research\nLanguages: Mandarin (Native) · English (Working Proficiency)\n","permalink":"https://lien0214.github.io/en/resume/","summary":"resume","title":"Resume"}]