Graph partitioning and repartitioning have been studied for decades in the literature. Yet, they are receiving more and more attention due to the increasing popularity of large graphs from various domains, such as social networks, web networks, telecommunication networks, and scientific simulations. Traditional well-studied graph (re)partitioners often scale poorly against these continuously growing graphs. Recent works on streaming graph partitioning and lightweight graph repartitioning usually assume a homogeneous computing environment. Unfortunately, modern parallel architectures may exhibit highly non-uniform network communication costs. A few solutions have been proposed to address this, but they all consider the network as the primary bottleneck of the system, even though transferring data across modern high-speed networks is now as fast as local memory access. As such, always trying to minimize the network data communication may not be a good choice. We found that putting too much data communication into partitions assigned to cores of the same machines may result in serious contention for the shared hardware resources (e.g., last level cache, memory controller, and front-side bus) on the memory subsystems in modern multicore clusters. The performance impact of the contention can even become the dominant factor in limiting the scalability of the workload, especially for multicore machines connected via high-speed networks. Another issue of existing graph (re)partitioners is that they are usually not aware of the runtime characteristics of the target workload. To enable efficient distributed graph computation, this thesis aims to design and implement new scalable graph (re)partitioners that take into account factors like the non-uniform network communication costs of the underlying computing infrastructure, the contentiousness of the memory subsystems on modern multicore machines, and the runtime characteristics of the target workload.